Re: Thread safety/reuse if Query ?

Kristian Rosenvold Thu, 04 Dec 2014 09:12:49 -0800

And if the Query object "by design" is supposed to only contain
metadata that should be initialized once, it should be fairly simple
to patch it for full concurrent thread safety, at least for instances
that are constructed before any consuming threads are created....
(Some of the variable names within the "Query" object certainly had me
fooled at first reading; - the list called "resultNodes" certainly
implies request-specific state for the uninitiated reader)


Kristian

2014-12-04 18:02 GMT+01:00 Kristian Rosenvold <[email protected]>:
> A single instance of the Query object is definitely not safe for
> concurrent access. I assumed it was, and that blew up on my first
> attempt at adding any kind of load. The first thing I see within the
> Query#setResultVars method is a boolean toggle "resultVarsSet" that
> only allows this to be set once per Query instance. If the Query
> instance was to be safe for concurrent access, there'd have to be
> synchronization around that.
>
> Alternately you could say that a single instance of a Query object is
> safe for sequential re-use (and that all the fields are really only
> metadata  fields), which I did not actually test. Are you saying this
> is supposed to be the case ? If so, it looks like it's sequential
> re-use on the single thread that created the Query, since that's the
> only kind of memory model visibility I see this class handling..?
>
> As for the queries in question they're pretty basic stuff, but we do a
> lot of them. Re-parsing the queries is over 50% of our processing time
> as measured by YJP. We run the same basic 10 different queries over
> nearly a million different graphs, which adds up to a fair bit of
> heat.
>
> (I can provide samples but they'd just be very mundane select SPARQL
> stuff, I am not claiming these to be very complex queries...)
>
> Kristian
>
> 2014-12-04 17:36 GMT+01:00 Andy Seaborne <[email protected]>:
>> On 04/12/14 15:01, Kristian Rosenvold wrote:
>>>
>>> And now I can seemingly use
>>>
>>> QueryExecution exec = QueryExecutionFactory.create(copyOf(q1template),
>>> model,params);
>>>
>>> and actually get this working concurrently with re-use of the parsed
>>> sparql expression. But of course there's ElementSubQuery pointing back
>>> at the Query object, so a deep copy of the queryPattern Element
>>> structure would probably also be required, which all seems a bit like
>>> barking up the wrong tree. And I'm not really sure I understand all
>>> the cross-references well enough to know that this would even
>>> work....(?)
>>
>>
>> The QueryExecutionFactory should work - the query parsed object is not
>> modified.  The params are used to seed the query execution.  The Query
>> object is reusable. (It had better be -- Query must be immutable after they
>> are fully created to be valid as map keys.)  A bit like an external VALUES
>> clause.
>>
>> Where are the actual costs?  Reparsing the string will be expensive at some
>> level of intense usage; manipulating the AST may not be.
>>
>> RIOT uses custom tokenizing and parsing for Turtle etc because javacc was
>> slow for large data.  SPARQL is a lot more complicated as a grammar (Turtle
>> is pretty trivial) and a parser generator is quite important yet it does all
>> that single char messing around that general tokenizing/parsing has to do.
>>
>> There is a "query template" system that works by minimally rewriting the AST
>> -- it has not been incorported into the code base; I wrote it at a time we
>> were discussing ParameterizedQueryString and security issues of injection.
>> It has test cases though :-)
>>
>> Do you have an exmaple of the sort of query you are working with?
>>
>>         Andy
>>
>> https://github.com/afs/AFS-Dev/tree/master/src/main/java/element
>>

Re: Thread safety/reuse if Query ?

Reply via email to