On Tue, Jul 3, 2012 at 10:56 AM, Claude Warren <[email protected]> wrote:
> I was looking at a couple of queries and found that
>
> SELECT * WHERE {
>  ?x a <foo> .
>  OPTIONAL{ ?x <bar> ?bar } .
>  OPTIONAL{ ?y <baz> ?baz }
> }
>
> becomes
> (leftjoin
>   (leftjoin
>     (bgp (triple ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <file:///home/claude/workspace/RoadMap/foo>))
>     (bgp (triple ?x <file:///home/claude/workspace/RoadMap/bar> ?bar)))
>   (bgp (triple ?x <file:///home/claude/workspace/RoadMap/baz> ?baz)))
>
> BUT ....
>
> SELECT * WHERE {
>  ?x a <foo> .
>  OPTIONAL{ ?x <bar> ?bar .
>     ?y <baz> ?baz }
> }
>
> becomes
>
> leftjoin
>   (bgp (triple ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <file:///home/claude/workspace/RoadMap/foo>))
>   (bgp
>     (triple ?x <file:///home/claude/workspace/RoadMap/bar> ?bar)
>     (triple ?x <file:///home/claude/workspace/RoadMap/baz> ?baz)
>   ))
>
>
> It seems to me that the 2nd form is much more efficient.  (Am I wrong here?)

Doesn't the second form create an outer product on the RHS? Isn't that
going to be less efficient?

The outer product could be computer lazily, but then isn't the effect
going to be more like the first form?

Looking at the code for Optional, I find myself in QueryIterLeftJoin,
which is mostly implemented in QueryIterJoinBase. (Did I go to the
right place?) From the looks of things in here, the first thing that
code does is to create a table from its RHS argument.

Is there any case where one of these operations evaluates itself WRT
the RHS lazily?

> That leads me to ask "Is there ever a time when 2 adjacent option
> statements can not be merged into a single statement?"

I have work to do, so I'm not trying to think of every case here. I'm
just looking at what you've done where the first OPTIONAL shares a
variable with the LHS, and the second OPTIONAL shares none.

In that case, the merged form is nearly equivalent, but not quite.

If the first optional term:
  OPTIONAL{ ?x <bar> ?bar }
results in no bindings and the second optional term:
  OPTIONAL{ ?y <baz> ?baz }
has bindings, then the final result will contain the bindings from the
second optional term.

However, the outer product of the first term and the second will be
empty, so the merged form of:
  OPTIONAL{ ?x <bar> ?bar . ?y <baz> ?baz}
will be empty. This means that the bindings for ?y and ?baz will NOT
appear in the merged form if there are no bindings for ?z and ?bar.

Ergo, they are not equivalent.

If you want to see, I just checked it against the following data:
@prefix ex: <http://example.com/ex#> .
ex:Alice        ex:childOf      ex:Bruce .
ex:Chuck        ex:brotherOf    ex:Bruce .

prefix ex: <http://example.com/ex#>
select * {
  ?x ex:brotherOf ?y
  OPTIONAL {?x ex:foo ?z . ?s ex:childOf ?o}
}

result:
{ x=<http://example.com/ex#Chuck>  y=<http://example.com/ex#Bruce> }


prefix ex: <http://example.com/ex#>
select * {
  ?x ex:brotherOf ?y
  OPTIONAL {?x ex:foo ?z }
  OPTIONAL {?s ex:childOf ?o}
}

result:
{ x=<http://example.com/ex#Chuck>  y=<http://example.com/ex#Bruce>
s=<http://example.com/ex#Alice>  o=<http://example.com/ex#Bruce> }


> And if not, should the parser be optimized to do the merging?

The two could possibly be handled with the same code, so long as there
were flags to indicate the behavior to adopt should an edge case be
encountered (such as one of the terms being empty). Typically, you
want a compelling use case to add this kind of complexity. In this
case, that would mean two forms of the query with the same output on a
given data set operating at significantly different speeds.

Given that the current code materializes the RHS, then it often be
better to opt for the first form. A dynamic optimizer could make that
determination based on the relative sizes of the binding sets
resulting from each form (if each optional resulted in a large set of
bindings, then the outer product would be *very* large).

Regards,
Paul

Reply via email to