Re: Carrying raw query strings (public API change).

Andy Seaborne Fri, 13 Apr 2012 10:25:38 -0700

On 13/04/12 17:00, Robert Vesse wrote:

We work at the level of QueryEngine but we have multiple
implementations as depending on the query either the entire thing can
be handled by the backend or only parts of it can so we override
eval() in our query engine implementations.


Then we either use a OpExecutor or just parcel the query off
wholesale to our backend.  So I was slightly inaccurate in that we no
longer use StageGenerator (though we did at one point)

Regardless we are still at a level of the API where we don't see the
QueryExecution so we couldn't utilize the context even if we wanted
to

The context get everywhere. QueryEngineBase has it as doesExecutionContext.

The context is the merge of the global and the dataset specific contextthen add in user settings. It will be available from ExecutionContextwhen it gets to the eval code.


ExecutionContext.getContext()

or I hope so - it's how the dataset and active graph get passed aroundto actually deliver data!

It's even available in custom functions - FunctionEnv is the interfaceit exposes but it is really the ExecutionContext object.


And it does the iterator tracking (have you met tracking yet? :-)

The more I think about this the less I think it actually solves our
problem (it being carrying raw query strings) because we are still
left with the issue that a query may turn into multiple queries
internally and our developers wanted the query string to associate
with each of those internal queries but that isn't a 1:1
relationship

Maybe it is for the best if I just go ahead and revert those
changes?

OK, no rush. ... and maybe open a JIRA if you think there is anarchitectural point here. Given your last comment (splitting queries)maybe there isn't, or isn't at the moment.


Stephen's API experiment may have something to say here.

        Andy


Rob

On Apr 12, 2012, at 12:02 PM, Andy Seaborne wrote:

Rob -

On 12/04/12 17:43, Robert Vesse wrote:

The notion of jobs makes sense to me but it implies some
refactoring of our APIs are is simply not feasible in our current
setup where we use Fuseki this is not doable because we are
extending Fuseki indirectly by hooking into ARQs
QueryExecutionFactory mechanism and so don't have any means to
create this Job thing prior to starting to see the actual query
in our ARQ integration layer.

Even in a hypothetical situation where we did have such
capability we still run into the issue that at some point the
query has to drop into the ARQ machinery to be processed at which
point it has to be a query and we'd lose any visibility back to
our Job notion anyway. This is especially true since the point at
which we actually send work off to our backend for processing is
potentially very low level in the ARQ API (as far down as the
Stage Generator layer)


This makes me a bit nervous; the needs of Cray to tunnel info from
one place to another because of current code structure balanced
against a long term change to the public API.

The good news is there is a better way in Fuseki.

The QueryExecution object is a one-time-use object and it has
somewhere to put such additional information - getContext().  This
is where the current time for the query goes for example.  It even
gets to the StageGenerator.  It's already got the query as an
object.

The Fuseki-specific HttpActionQuery doesn't get into ARQ - it's the
nearest I can see the "Job" from the point of view of the web
request.

So we can have the QueryExecution carry a per-operation label.

Change:

1/ Add a new symbol: ARQ.queryLabel

2/ SPARQL_Query.executeQuery creates the QueryExecution and can set
the context with a key/value that is the query string as
ARQ.queryLabel.  It knowns the queryStringLog -- it can take the
original query string as well, or we can put it in the
HttpActionQuery and put in the execution context.

(Aside: I thought you'd be using OpExecutor so as to access the
filters and LeftJoins as -- different discussion though ... though
I'd like to remove StageGenerator because there are too many ways
to do very similar things makign it messier to add new storage
layers .., so compatibility issue noted!)

I don't think having the raw query string breaks the Java
equality/hash code contract since the Query class is a
structural representation of a query, preserving the original
query string is just a convenience to users and doesn't change
the fact that the class is a structural representation of a query
and by definition different query strings can resolve to the same
definition (white space, comments, prefix ordering etc.)


In your use case, sure, the string is not particularly
significant.

The contract for .equals in java is that two objects to be equal
they must be substitutable for one another.  Jena is a general
library - some app may rely on the query label for display
purposes, or as a key into another data structure.  That's the
long-term promise being made and it's hard to predicate what some
app may do - hence my desire for a strict adherence to the .equals
contract.

Also, by preserving the query string and comments, there is a
slippery slope to putting stuff in comments and relying on it.

Preserving the query string is convenience in your use case but if
some other use case is relying on the label for something, it is no
longer ancillary.

This shows a difference - a bit artificial but it's also supposed
to be small example -- image the two "put" operations being in very
different parts of the code:

try changing the order of the two .put -- I expected the different
output when it was put(q1,..), put(q2,...) but the way round below.
We live and learn about the runtime library implementation --
HashMap.put sets the entry to the last accessed object.  Other JREs
may differ.

public class QueryLabels { public static void main(String ...
argv) { Map<Query, String>  x = new HashMap<>() ; Query q1 =
QueryFactory.create("ASK{}") ; Query q2 =
QueryFactory.create("ASK{} # Andy's query ") ;

x.put(q2, q2.getRawQuery()) ; x.put(q1, q1.getRawQuery()) ;

if ( x.containsKey(q2) ) { System.out.println(x.get(q2)) ;
System.out.println("---") ; System.out.println(q2.getRawQuery()) ;
} else { System.out.println("Not found") ; } } }

Re: Carrying raw query strings (public API change).

Reply via email to