Re: Definition of SPARQL variable pre-binding

Osma Suominen Tue, 16 Jun 2015 05:06:07 -0700

Here's a slightly relevant discussion about how to support somethinglike pre-bound variables / parametrized queries in YASQE, a graphicalSPARQL editor component in the YASGUI suite (and used by Fuseki amongothers): https://github.com/YASGUI/YASQE/issues/24

I'm not sure I understand all the issues here very deeply, but it wouldseem useful to have a standard way of expressing and executingparametrized SPARQL queries, which could then be applied by YASQE andSHACL among others.


-Osma




On 16/06/15 12:51, Andy Seaborne wrote:

On 16/06/15 09:33, Holger Knublauch wrote:

Thanks, Andy.

On 6/16/15 6:03 PM, Andy Seaborne wrote:

On 16/06/15 04:20, Holger Knublauch wrote:

Hi,

(this question is motivated by the ongoing Data Shapes WG, but I don't
speak on their behalf).


Ptr?

http://w3c.github.io/data-shapes/shacl/

esp http://w3c.github.io/data-shapes/shacl/#sparql-constraints-prebound

http://www.w3.org/2014/data-shapes/track/issues/68


Thanks.


Jena and other APIs such as Sesame support the concept of pre-binding
variables prior to SPARQL execution, using
QueryExecution.setInitialBinding(). This is convenient to reuse
parameterized queries, especially with blank nodes.

Question: is there any formal basis of this functionality,
formulated so
that it can be implemented by other platforms too? I can see that it
populates the original bindings that are passed through the algebra
objects, but what would be the best way to explain this by means of
concepts from the SPARQL 1.1 spec?

Thanks
Holger


There are two possible explanations - they are not quite the same.

1/ It's a substitution of a variable for a value execution.  This is
very like parameterized queries. It's a pre-execution step.


Do you mean syntactic insertion like the ParameterizedQuery class? This
would not support bnodes, and the shapes and focus nodes of a SHACL
constraint will frequently be bnodes. It should also avoid repeated
query parsing, for performance reasons it would be better to operate on
Query objects and their general equivalents (Algebra objects).


Substitution does not have to be in syntax - it's rewriting the AST with
the real, actual bnode.

2/ VALUES

There is a binding as a one row VALUES table and it's join'ed into the
query as usual.


I guess inserting a VALUES clause into the beginning would work, but
then again what about bnodes? I guess instead of the VALUES keyword (as
a string), it would need to rely on the equivalent algebra object?

Just to be clear, this only needs to work in local datasets, not
necessarily with SPARQL endpoints where all we have is a http string
interface. I am looking for a couple of sentences that would provide a
generic implementation strategy that most SPARQL engines either already
have, or could easily add to support SHACL.

Thanks
Holger


Firstly - I'm talking about principles and execution, not syntax. VALUES
is the way to get a data table into a SPARQL execution.
setInitialBinding happens after parsing - injecting the preset row into
execution.

The real (first) issue with blank nodes isn't putting them back in a
query; it's getting them in the first place.

As soon as a blank node is serialized in all W3C formats (RDF, any
SPARQL results), it isn't the same blank node.  There is an equivalent
one in the document.

If you are thinking of local API use, where the results are never
serialized, then it's not an issue - like setInitialBinding, it's an API
issue.  setInitialBinding is working after parsing.

I'm afraid that section 12.1.1 is sliding towards mixing up syntax
issues with abstraction and execution.  To keep to standards, you have
to talk about SPARQL as a syntax.  You may get away with something like
"?this has the value from <how you found it>" or
"SPARQL execution must ensure that ?this has a value XXX in the
answers". Though XXX and blank nodes will cause the usual reactions. You
and I can probably macro-generate the debate ahead of time.
Perma-thread-37.

The perfect answer is (might be) to repeat the pattern that found ?this
in the first place.  Obvious efficiency issues if done naively.  But
otherwise, there is no way to connect the results of one SPARQL query to
another query within the standards only.

[Now - may I do a 50% rules, 50% procedural language that wires together
multiple SPARQL queries and updates, please?]

ARQ's solution to this is <_:...> URIs.  They name the bnode and the
parser replaces them with the real blank node.

In fact, RDF 1.1 says:
[[
3.4 Blank Nodes

Blank nodes are disjoint from IRIs and literals. Otherwise, the set of
possible blank nodes is arbitrary. RDF makes no reference to any
internal structure of blank nodes.
]]
so you could say, for the RDF abstract syntax, there is a 1-1 labelling
of all bnodes in use (i.e. finite - none of this axion-of-choice stuff)
by UUID and just be done with it.  Given the UUID, you can find the
blank node.

Some people mix RDF abstract syntax with meaning of blank nodes
(entailment) but they are different.  abstract syntax == data structure.

As a data structure, blank nodes are just nodes in a graph.  So invent a
reference for them (not a URI, not a literal).  Every RDF systems does
anyway even if it is implicitly there like a java object reference (not
Jena - blank nodes are the same by .equals, not ==; usual java stuff here).

     Andy

Differences in these viewpoints can occur in nested patetrns -
sub-queries (you can have different variables with the same name - a
textual substitution viewpoint breaks that) and OPTIONALs inside
OPTIONALs (bottom up execution is not the same as top down execution).

This has existed in ARQ for a very long time.  ARQ actually takes the
initial binding and seeds the execution from there so it's like (2)
but not exactly; it does respect non-projected variables inside nested
SELECTS; it does not complete respect certain cases of
OPTIONAL-inside-OPTIONAL.


[[
Actually - it isn't even as simple as that as the optimizer is aware of
these tricky OPTIONAL-OPTIONAL cases and may do the right thing.

The case of nested optionals, with a variable being mentioned only in
the inner most and outer most patterns, but not intermediate ones, are
rare even for generated queries from compositions in my experience.
]]


    Andy



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: Definition of SPARQL variable pre-binding

Reply via email to