Doesn't the base Jena implementation do a hash join by default?

I am also thinking querying very large endpoints (chembank, genome, etc.),
it is entirely possible that the results from one endpoint may exceed the
memory of the system making the call.


On Wed, Sep 4, 2013 at 10:36 AM, Andy Seaborne <[email protected]> wrote:

> On 04/09/13 09:33, Claude Warren wrote:
>
>> I have been thinking about strategies for optimizing federated queries and
>> have come to the point where I need do do a merge of potentially very
>> large
>> results sets where not all results sets contain all the values.
>>
>> Consider:
>>
>> Service <a> { [] <x:foo> ?foo ;
>>   <x:bar> ?bar ;
>>   <x:fob> ?fob .
>> }
>> union
>> Service <b> { [] <x:foo> ?foo ;
>>    <x:baz> ?baz ;
>>    <x:fob> ?fob .
>> }
>>
>> which would yield results sets that have the structure:
>> {?foo ?bar ?fob ?bap}
>>
>> ?bar will always come from <a> and ?bap will always come from <b>, but
>> ?foo
>> and ?fob may come from either.
>>
>> I am thinking that the results from <a> could be inserted into a temporary
>> graph as
>>
>> _x <x:foo> ?foo
>> _x <x:bar> ?bar
>> _x <x:fob> ?fob
>>
>> then results from <b> could be inserted into the graph as updates to
>> existing records where { [] <x:foo> ?foo ;  <x:fob> ?fob} any missing
>> records could be inserted.
>>
>> The merged result set can then be extracted from the temporary graph as
>>
>> select ?foo, ?bar, ?fob, ?baz
>> where { ?dummy <x:foo> ?foo ;
>>   <x:fob> ?fob ;
>> OPTIONAL
>>   { ?dummy <x:bar> ?bar }
>> OPTIONAL
>>   { ?dummy <x:baz> ?baz}
>> }
>>
>
> Isn't that the join of the two service calls, not the union?
>
> Have you considered doing a hash-join with a key (?foo, ?fob), the shared
> variables?
>
> A hash join only needs a complete copy of one of the tables in memory, not
> both.
>
>         Andy
>
>
>  Questions:
>> Has anyone attempted this?
>> Does anyone see any functional issues with the approach?
>>
>> I understand that there would be performance issues with small datasets,
>> but for large datasets it may make sense.  Thoughts?
>>
>>
>> Claude
>>
>>
>


-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to