On 04/09/13 09:33, Claude Warren wrote:
I have been thinking about strategies for optimizing federated queries and
have come to the point where I need do do a merge of potentially very large
results sets where not all results sets contain all the values.

Consider:

Service <a> { [] <x:foo> ?foo ;
  <x:bar> ?bar ;
  <x:fob> ?fob .
}
union
Service <b> { [] <x:foo> ?foo ;
   <x:baz> ?baz ;
   <x:fob> ?fob .
}

which would yield results sets that have the structure:
{?foo ?bar ?fob ?bap}

?bar will always come from <a> and ?bap will always come from <b>, but ?foo
and ?fob may come from either.

I am thinking that the results from <a> could be inserted into a temporary
graph as

_x <x:foo> ?foo
_x <x:bar> ?bar
_x <x:fob> ?fob

then results from <b> could be inserted into the graph as updates to
existing records where { [] <x:foo> ?foo ;  <x:fob> ?fob} any missing
records could be inserted.

The merged result set can then be extracted from the temporary graph as

select ?foo, ?bar, ?fob, ?baz
where { ?dummy <x:foo> ?foo ;
  <x:fob> ?fob ;
OPTIONAL
  { ?dummy <x:bar> ?bar }
OPTIONAL
  { ?dummy <x:baz> ?baz}
}

Isn't that the join of the two service calls, not the union?

Have you considered doing a hash-join with a key (?foo, ?fob), the shared variables?

A hash join only needs a complete copy of one of the tables in memory, not both.

        Andy

Questions:
Has anyone attempted this?
Does anyone see any functional issues with the approach?

I understand that there would be performance issues with small datasets,
but for large datasets it may make sense.  Thoughts?


Claude


Reply via email to