Re: Parallelism in query execution

Rob Vesse Mon, 08 Feb 2016 02:44:02 -0800

Comments inline:

On 04/02/2016 14:54, "Andy Seaborne" <[email protected]> wrote:


>On 02/02/16 14:52, Rob Vesse wrote:
>> The other place huge UNIONs can appear is if you want to try and flatten
>> out property paths queries involving +, - and * into approximated
>>queries
>> that yield the same results
>>
>> I don't think Jena does this but we at Cray certainly provide users the
>> option to do this
>
>Do you formally support :p{N,M} in paths?  IIRC they have different
>cardinality from * and +

No we don't

>
>What's "-"?

A typo, meant to be "|" I.e. alternative
>
>> I like the idea of maybe buffering slightly and then intelligently
>> deciding what kind of join to use.  However this probably needs to take
>> into account the overall query structure because if you have a small
>> LIMIT/OFFSET present then you want to do the minimum work possible so
>> don't want to compute an unnecessary hash join when an index join could
>> have yielded the one result you needed much faster.
>
>I agree that the technique needs some care.  It is in danger of
>multiplying up as well, because an index/substitution execution begats N
>follow up units.
>
>For parallelism, the index could be chunked into work units of, say, 25
>intermediates and run one parallel processor per 25 hits. This is trying
>to amortize the channel and synchronization costs.

I like that approach, it seems like a sensible balance

Rob

>
>       Andy
>
>>
>> Rob
>>
>> On 02/02/2016 14:17, "Andy Seaborne" <[email protected]> wrote:
>>
>>> Rob,
>>>
>>> Thanks for the details.  Lots of good stuff in there.
>>>
>>> I am thinking about both tweaking current execution and for a new
>>> (hypothetical) execution engine more targeted at analytics workloads,
>>>say.
>>>
>>> I have seem some truly monstrous UNIONs (1000s), and FILTER (NOT) IN
>>> leads to unions as well.
>>>
>>> The case of a IFP yielding one result, and using that as an index join
>>> is somewhat import. A small amount of buffering probably would not hurt
>>> the execution and make it adapt to large LHS matches. One thing that
>>> might be interesting is switch between index and hash join by sniffing
>>> the input stream and if it is below some threshold use index join and
>>>do
>>> a block hash join on a few multiple threads above the threshold.  Or a
>>> full-blown parallel hash join (takes twice the working state - it
>>> retains up to about twice the size of the smaller side of the join).
>>>
>>>
>>>     Andy
>>>
>>> On 29/01/16 14:06, Rob Vesse wrote:
>>>> Andy
>>>>
>>>> Parallelism is used in a couple of main places, bear in mind that
>>>> dotNetRDF uses a block based engine rather than a streaming engine
>>>>(I.e.
>>>> no OpSequence, OpConditional) although it does do a form of index join
>>>> to
>>>> avoid unnecessary work wherever possible.  The main areas in which it
>>>> uses
>>>> parallelism are joins and filters.  So the unit of work is over a
>>>> multi-set of possible solutions in SPARQL spec parlance.
>>>>
>>>> For joins since we are joining (or left joining or minus-ing) the
>>>> results
>>>> of two operators we essentially do a parallelized hash join between
>>>>the
>>>> two sides.  The hash table is built in serial from the LHS results and
>>>> then we parallelise over the RHS results doing look ups into the hash
>>>> table and outputting the join results in parallel.  You have to be
>>>> careful
>>>> about the data structures used to avoid threads stomping each others
>>>> results but this isn't too difficult.
>>>>
>>>> For filters we do something much closer to vectorization where
>>>> essentially
>>>> we parallelise the evaluation of the expression over all the possible
>>>> solutions (again we're a blocking engine) and recombine the results
>>>> afterwards.
>>>>
>>>> In terms of control in the .Net world we benefit from PLinq which are
>>>> CLR
>>>> supplied extensions to the basic Linq constructs (aka Streams in Java
>>>>8)
>>>> that automatically parallelize according to the available resources on
>>>> the
>>>> machine.  I'm not sure how smart this is but at least in the .Net
>>>>world
>>>> you can constrain it if you want.
>>>>
>>>> For Jena which is a predominantly streaming engine I don't see either
>>>>of
>>>> these approaches providing great benefits.  They could be used in the
>>>> cases where ARQ does have to do block evaluation to improve things
>>>> subject
>>>> to the points about how best to control the level of parallelism.  The
>>>> other area where there may be some benefit is to consider the case of
>>>> queries with UNION constructs, particularly for queries where we are
>>>> doing
>>>> lazy evaluation (there's a LIMIT and/or OFFSET, ASK etc.) where it may
>>>> be
>>>> possible to dispatch multiple branches of the UNION in parallel.
>>>>
>>>> Rob
>>>>
>>>> On 29/01/2016 11:49, "Andy Seaborne" <[email protected]> wrote:
>>>>
>>>>> Rob,
>>>>>
>>>>> In dotNetRDF, there is parallel execution, isn't there?
>>>>>
>>>>> I have been thinking (toying with) the idea of parallel execution
>>>>>and I
>>>>> wondered what unit of work is for the parallelism in dotNetRDF.
>>>>>
>>>>> What little thinking I've done suggests that tapping into the
>>>>> parallelism in java streams is not the right way to do it (which is a
>>>>> shame as that's less work).  It needs more control and probably
>>>>>larger
>>>>> units of work. There is a danger that small/fast queries slow down
>>>>>due
>>>>> to too much thinking.
>>>>>
>>>>> It needs more control as well to limit how much of the machine it
>>>>>will
>>>>> take over because, in Fuseki, it might lead to starvation of other
>>>>> requests.  As some usage is " many clients, many small requests",
>>>>> parallelism can impact the the system negatively as well as
>>>>>positively.
>>>>> At some point, the limitation will be the connection of CPUs to RAM
>>>>> rather than cycles.
>>>>>
>>>>>       Andy
>>>>>
>>>>> Historical note: RDQL had true parallelism once-upon-a-time.  An RDQL
>>>>> query is a BGP+Filter and not more.  The filter ran on a separate
>>>>> thread
>>>>> to the BGP solver.  Timed gain ... just 10%.  This was on a early
>>>>> generation 2 CPU, 2 processor machine so the cost of threading was
>>>>> huge.
>>>>>    Most users then did not have a multi-anything machine. It lead to
>>>>> lots
>>>>> of problems with thread management when Java wasn't what it is today.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

Re: Parallelism in query execution

Reply via email to