Re: Definition of SPARQL variable pre-binding

Holger Knublauch Tue, 16 Jun 2015 16:22:16 -0700

On 6/16/2015 22:03, Osma Suominen wrote:

Here's a slightly relevant discussion about how to support somethinglike pre-bound variables / parametrized queries in YASQE, a graphicalSPARQL editor component in the YASGUI suite (and used by Fuseki amongothers): https://github.com/YASGUI/YASQE/issues/24


Thanks for the pointer.

I'm not sure I understand all the issues here very deeply, but itwould seem useful to have a standard way of expressing and executingparametrized SPARQL queries, which could then be applied by YASQE andSHACL among others.

Indeed. Maybe the SHACL templates [1] could be one solution to that,assuming SHACL becomes a W3C standard. In the current draft you wouldspecify a template as


ex:MyTemplate
    a sh:Template ;
    rdfs:label "My template" ;
    rdfs:comment "Gets a list of all people born in a given country" ;
    sh:argument [
        sh:predicate ex:country ;
        sh:valueType schema:Country ;
        rdfs:comment "The country to get all people for" ;
    ] ;
    sh:sparql """
        SELECT ?person
        WHERE {
            ?person ex:bornIn ?country .
        } """ ;
.

This structure provides enough metadata to drive user interfaces, e.g.input forms where users select a country from a list. The semantics inthe current draft are that variables become pre-bound (ex:country ->?country). This approach has the advantage that each query can beinstantiated as a naturally valid RDF instance, e.g.


ex:ExampleQuery
    a ex:MyTemplate ;
    ex:country ex:Germany .

This can then be used as a high level language for all kinds of querycalls as constraints, rules or whatever - experts can prepare the SPARQLwhile end users just fill in the blanks.

The semantics are intended to be like inserting a VALUES clause into the"beginning" of the query, i.e. they wouldn't be visible in sub-selectsetc. In contrast to text-substitution algorithms, this also makes surethat queries are always syntactically valid and can be pre-compiled.


Holger

[1] http://w3c.github.io/data-shapes/shacl/#templates


-Osma




On 16/06/15 12:51, Andy Seaborne wrote:

On 16/06/15 09:33, Holger Knublauch wrote:

Thanks, Andy.

On 6/16/15 6:03 PM, Andy Seaborne wrote:

On 16/06/15 04:20, Holger Knublauch wrote:
Hi,
(this question is motivated by the ongoing Data Shapes WG, but Idon't
speak on their behalf).
Ptr?

http://w3c.github.io/data-shapes/shacl/

esp http://w3c.github.io/data-shapes/shacl/#sparql-constraints-prebound

http://www.w3.org/2014/data-shapes/track/issues/68


Thanks.


Jena and other APIs such as Sesame support the concept of pre-binding
variables prior to SPARQL execution, using
QueryExecution.setInitialBinding(). This is convenient to reuse
parameterized queries, especially with blank nodes.

Question: is there any formal basis of this functionality,
formulated so
that it can be implemented by other platforms too? I can see that it
populates the original bindings that are passed through the algebra
objects, but what would be the best way to explain this by means of
concepts from the SPARQL 1.1 spec?

Thanks
Holger


There are two possible explanations - they are not quite the same.

1/ It's a substitution of a variable for a value execution. This is
very like parameterized queries. It's a pre-execution step.


Do you mean syntactic insertion like the ParameterizedQuery class? This
would not support bnodes, and the shapes and focus nodes of a SHACL
constraint will frequently be bnodes. It should also avoid repeated
query parsing, for performance reasons it would be better to operate on
Query objects and their general equivalents (Algebra objects).


Substitution does not have to be in syntax - it's rewriting the AST with
the real, actual bnode.

2/ VALUES

There is a binding as a one row VALUES table and it's join'ed into the
query as usual.


I guess inserting a VALUES clause into the beginning would work, but
then again what about bnodes? I guess instead of the VALUES keyword (as
a string), it would need to rely on the equivalent algebra object?

Just to be clear, this only needs to work in local datasets, not
necessarily with SPARQL endpoints where all we have is a http string
interface. I am looking for a couple of sentences that would provide a
generic implementation strategy that most SPARQL engines either already
have, or could easily add to support SHACL.

Thanks
Holger


Firstly - I'm talking about principles and execution, not syntax. VALUES
is the way to get a data table into a SPARQL execution.
setInitialBinding happens after parsing - injecting the preset row into
execution.

The real (first) issue with blank nodes isn't putting them back in a
query; it's getting them in the first place.

As soon as a blank node is serialized in all W3C formats (RDF, any
SPARQL results), it isn't the same blank node.  There is an equivalent
one in the document.

If you are thinking of local API use, where the results are never
serialized, then it's not an issue - like setInitialBinding, it's an API
issue.  setInitialBinding is working after parsing.

I'm afraid that section 12.1.1 is sliding towards mixing up syntax
issues with abstraction and execution.  To keep to standards, you have
to talk about SPARQL as a syntax.  You may get away with something like
"?this has the value from <how you found it>" or
"SPARQL execution must ensure that ?this has a value XXX in the
answers". Though XXX and blank nodes will cause the usual reactions. You
and I can probably macro-generate the debate ahead of time.
Perma-thread-37.

The perfect answer is (might be) to repeat the pattern that found ?this
in the first place.  Obvious efficiency issues if done naively. But
otherwise, there is no way to connect the results of one SPARQL query to
another query within the standards only.

[Now - may I do a 50% rules, 50% procedural language that wires together
multiple SPARQL queries and updates, please?]

ARQ's solution to this is <_:...> URIs.  They name the bnode and the
parser replaces them with the real blank node.

In fact, RDF 1.1 says:
[[
3.4 Blank Nodes

Blank nodes are disjoint from IRIs and literals. Otherwise, the set of
possible blank nodes is arbitrary. RDF makes no reference to any
internal structure of blank nodes.
]]
so you could say, for the RDF abstract syntax, there is a 1-1 labelling
of all bnodes in use (i.e. finite - none of this axion-of-choice stuff)
by UUID and just be done with it.  Given the UUID, you can find the
blank node.

Some people mix RDF abstract syntax with meaning of blank nodes
(entailment) but they are different.  abstract syntax == data structure.

As a data structure, blank nodes are just nodes in a graph.  So invent a
reference for them (not a URI, not a literal).  Every RDF systems does
anyway even if it is implicitly there like a java object reference (not

Jena - blank nodes are the same by .equals, not ==; usual java stuffhere).


     Andy

Differences in these viewpoints can occur in nested patetrns -
sub-queries (you can have different variables with the same name - a
textual substitution viewpoint breaks that) and OPTIONALs inside
OPTIONALs (bottom up execution is not the same as top down execution).

This has existed in ARQ for a very long time.  ARQ actually takes the
initial binding and seeds the execution from there so it's like (2)
but not exactly; it does respect non-projected variables inside nested
SELECTS; it does not complete respect certain cases of
OPTIONAL-inside-OPTIONAL.


[[
Actually - it isn't even as simple as that as the optimizer is aware of
these tricky OPTIONAL-OPTIONAL cases and may do the right thing.

The case of nested optionals, with a variable being mentioned only in
the inner most and outer most patterns, but not intermediate ones, are
rare even for generated queries from compositions in my experience.
]]


    Andy

Re: Definition of SPARQL variable pre-binding

Reply via email to