Hi,

I finally got to writing those tests, which I forwarded on to the
SPARQL WG. I'm glad I did, as I'd missed an edge case.  :-)

Please find my patch for jena-arq included. I don't know how well I've
stuck to Jena standards in naming/packaging/etc, though I tried to
copy the style. I'm happy to take any feedback so that I can minimize
the review effort on any future code I may write.

This patch optimizes the common case where all common variables on
both sides of a MINUS expression are bound. If the RHS (the
subtrahend) contains an unbound value then it reverts to the same
linear search algorithm that currently exists. If the LHS (the
minuend) contains an unbound value, then that binding will be matched
using a linear search algorithm, though bindings will all the common
variables bound will use the indexed data.

The index used here is a hash map. Of course, that can overflow the
heap but this is no different to the original code. If there is a need
to use disk to avoid the heap then I have options there too.

Regards,
Paul

On Fri, Jun 15, 2012 at 12:53 AM, Paul Gearon <[email protected]> wrote:
> On Thu, Jun 14, 2012 at 4:25 PM, Rob Vesse <[email protected]> wrote:
>> The 1.1 test suite does contain tests for MINUS but AFAICT the ARQ trunk
>> does not contain these.  I was actually asking about this in a separate
>> thread on the dev list the other day.  Presumably at least Andy has them
>> somewhere because he produces EARL reports re: ARQ's SPARQL 1.1 compliance
>> but it doesn't look like that test suite is anywhere obvious in SVN yet
>
>
> Ah, I had mistakenly thought that I had made my search for "minus"
> case insensitive. :-)
>
> However, while I see numerous tests for MINUS with interesting
> combinations of FILTERS, NOT EXISTS, and cascaded MINUS operations, I
> do not see any that result in either the minuend (LHS) or the
> subtrahend (RHS) having unbound variables. It's these situations that
> lead to so many edge cases, and make indexing so hard. Fortunately,
> they're not very likely to happen in real world use-cases, so
> efficiency is less of a concern, but they still need to be addressed
> for correctness.
>
> Paul

Reply via email to