Hi Andy!

Thanks a lot for your answers, I got a lot of insight into this.

Andy Seaborne kirjoitti 17.12.2018 klo 18.40:
1. Should this be considered a bug, or is it just an obscure case of the query optimizer working a bit differently than before?

2. Can you recommend how I should fix the query so it won't blow up again?

Is this faster and get the right answer?

Yes, it is as fast as before Jena 3.8.0 (<100ms for the original query, <10ms for the minimal one) and the answer is correct.

VALUES at start of WHERE.

I did this and it also gives the same performance boost as switching to MINUS - or at least nearly the same, since the version with MINUS seems to be slightly faster (59ms vs 66ms in one particular test run, repeated many many times to confirm that the small difference is indeed real).

Looks like JENA-1534.

Having VALUES as a whole-query additional clause, then use it in the outer and innermost levels, but not in between, stops the VALUES end clause being moved to the start of the WHERE block.

Right, this could well be the explanation.

(I am having difficult working out what the query is trying to do!)

Sure, it's a bit difficult...

I tried to explain in my previous mail. But now I also made a diagram:
https://docs.google.com/drawings/d/1nd-_pk3BEq2D_Cd1HkA_uHGhBI8KxUVYEHmVP9u80Ck/edit?usp=sharing

It has to do with SKOS concept hierarchies and collections (often called arrays in thesaurus terminology). For a concept such as "milk", I want to display the narrower concepts (e.g. "cow milk", "goat milk") grouped by the collections/arrays they may be placed under. But I only want to display collections/arrays whose *all* concepts are narrower concepts of "milk", such as <goat products>.

Or, put in another way, when querying for narrowers of "milk" with their arrays, I *don't* want to include arrays, which contain at least one concept that *isn't* a narrower concept of "milk". Thus the double negative in the query: FILTER NOT EXISTS { ... FILTER NOT EXISTS { ... } } (or MINUS in the new formulation)

I haven't figured out a way to write this query in some other way that wouldn't use a double negative pattern.

So I will just rewrite the query using MINUS and placing VALUES first in the WHERE block, and I hope it won't get de-optimized again in some future Jena release :)

-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to