Re: Carrying raw query strings (public API change).

Robert Vesse Fri, 13 Apr 2012 09:04:40 -0700

We work at the level of QueryEngine but we have multiple implementations as 
depending on the query either the entire thing can be handled by the backend or 
only parts of it can so we override eval() in our query engine implementations.


Then we either use a OpExecutor or just parcel the query off wholesale to our 
backend.  So I was slightly inaccurate in that we no longer use StageGenerator 
(though we did at one point)

Regardless we are still at a level of the API where we don't see the 
QueryExecution so we couldn't utilize the context even if we wanted to

The more I think about this the less I think it actually solves our problem (it 
being carrying raw query strings) because we are still left with the issue that 
a query may turn into multiple queries internally and our developers wanted the 
query string to associate with each of those internal queries but that isn't a 
1:1 relationship

Maybe it is for the best if I just go ahead and revert those changes?

Rob

On Apr 12, 2012, at 12:02 PM, Andy Seaborne wrote:

> Rob -
> 
> On 12/04/12 17:43, Robert Vesse wrote:
>> The notion of jobs makes sense to me but it implies some refactoring
>> of our APIs are is simply not feasible in our current setup where we
>> use Fuseki this is not doable because we are extending Fuseki
>> indirectly by hooking into ARQs QueryExecutionFactory mechanism and
>> so don't have any means to create this Job thing prior to starting to
>> see the actual query in our ARQ integration layer.
> >
>> Even in a hypothetical situation where we did have such capability we
>> still run into the issue that at some point the query has to drop
>> into the ARQ machinery to be processed at which point it has to be a
>> query and we'd lose any visibility back to our Job notion anyway.
>> This is especially true since the point at which we actually send
>> work off to our backend for processing is potentially very low level
>> in the ARQ API (as far down as the Stage Generator layer)
> 
> This makes me a bit nervous; the needs of Cray to tunnel info from one place 
> to another because of current code structure balanced against a long term 
> change to the public API.
> 
> The good news is there is a better way in Fuseki.
> 
> The QueryExecution object is a one-time-use object and it has somewhere to 
> put such additional information - getContext().  This is where the current 
> time for the query goes for example.  It even gets to the StageGenerator.  
> It's already got the query as an object.
> 
> The Fuseki-specific HttpActionQuery doesn't get into ARQ - it's the nearest I 
> can see the "Job" from the point of view of the web request.
> 
> So we can have the QueryExecution carry a per-operation label.
> 
> Change:
> 
> 1/ Add a new symbol: ARQ.queryLabel
> 
> 2/ SPARQL_Query.executeQuery creates the QueryExecution and can set the 
> context with a key/value that is the query string as ARQ.queryLabel.  It 
> knowns the queryStringLog -- it can take the original query string as well, 
> or we can put it in the HttpActionQuery and put in the execution context.
> 
> (Aside: I thought you'd be using OpExecutor so as to access the filters and 
> LeftJoins as -- different discussion though ... though I'd like to remove 
> StageGenerator because there are too many ways to do very similar things 
> makign it messier to add new storage layers .., so compatibility issue noted!)
> 
>> I don't think having the raw query string breaks the Java
>> equality/hash code contract since the Query class is a structural
>> representation of a query, preserving the original query string is
>> just a convenience to users and doesn't change the fact that the
>> class is a structural representation of a query and by definition
>> different query strings can resolve to the same definition (white
>> space, comments, prefix ordering etc.)
> 
> In your use case, sure, the string is not particularly significant.
> 
> The contract for .equals in java is that two objects to be equal they must be 
> substitutable for one another.  Jena is a general library - some app may rely 
> on the query label for display purposes, or as a key into another data 
> structure.  That's the long-term promise being made and it's hard to 
> predicate what some app may do - hence my desire for a strict adherence to 
> the .equals contract.
> 
> Also, by preserving the query string and comments, there is a slippery slope 
> to putting stuff in comments and relying on it.
> 
> Preserving the query string is convenience in your use case but if some other 
> use case is relying on the label for something, it is no longer ancillary.
> 
> This shows a difference - a bit artificial but it's also supposed to be small 
> example -- image the two "put" operations being in very different parts of 
> the code:
> 
> try changing the order of the two .put -- I expected the different output 
> when it was put(q1,..), put(q2,...) but the way round below.  We live and 
> learn about the runtime library implementation -- HashMap.put sets the entry 
> to the last accessed object.  Other JREs may differ.
> 
> public class QueryLabels
> {
>    public static void main(String ... argv)
>    {
>        Map<Query, String> x = new HashMap<>() ;
>        Query q1 = QueryFactory.create("ASK{}") ;
>        Query q2 = QueryFactory.create("ASK{} # Andy's query ") ;
> 
>        x.put(q2, q2.getRawQuery()) ;
>        x.put(q1, q1.getRawQuery()) ;
> 
>        if ( x.containsKey(q2) )
>        {
>            System.out.println(x.get(q2)) ;
>            System.out.println("---") ;
>            System.out.println(q2.getRawQuery()) ;
>        }
>        else
>        {
>            System.out.println("Not found") ;
>        }
>    }
> }
>

Re: Carrying raw query strings (public API change).

Reply via email to