Hi Andy!
Thanks a lot for your answers, I got a lot of insight into this.
Andy Seaborne kirjoitti 17.12.2018 klo 18.40:
1. Should this be considered a bug, or is it just an obscure case of
the query optimizer working a bit differently than before?
2. Can you recommend how I should fix the query so it won't blow up
again?
Is this faster and get the right answer?
Yes, it is as fast as before Jena 3.8.0 (<100ms for the original query,
<10ms for the minimal one) and the answer is correct.
VALUES at start of WHERE.
I did this and it also gives the same performance boost as switching to
MINUS - or at least nearly the same, since the version with MINUS seems
to be slightly faster (59ms vs 66ms in one particular test run, repeated
many many times to confirm that the small difference is indeed real).
Looks like JENA-1534.
Having VALUES as a whole-query additional clause, then use it in the
outer and innermost levels, but not in between, stops the VALUES end
clause being moved to the start of the WHERE block.
Right, this could well be the explanation.
(I am having difficult working out what the query is trying to do!)
Sure, it's a bit difficult...
I tried to explain in my previous mail. But now I also made a diagram:
https://docs.google.com/drawings/d/1nd-_pk3BEq2D_Cd1HkA_uHGhBI8KxUVYEHmVP9u80Ck/edit?usp=sharing
It has to do with SKOS concept hierarchies and collections (often called
arrays in thesaurus terminology). For a concept such as "milk", I want
to display the narrower concepts (e.g. "cow milk", "goat milk") grouped
by the collections/arrays they may be placed under. But I only want to
display collections/arrays whose *all* concepts are narrower concepts of
"milk", such as <goat products>.
Or, put in another way, when querying for narrowers of "milk" with their
arrays, I *don't* want to include arrays, which contain at least one
concept that *isn't* a narrower concept of "milk". Thus the double
negative in the query: FILTER NOT EXISTS { ... FILTER NOT EXISTS { ... }
} (or MINUS in the new formulation)
I haven't figured out a way to write this query in some other way that
wouldn't use a double negative pattern.
So I will just rewrite the query using MINUS and placing VALUES first in
the WHERE block, and I hope it won't get de-optimized again in some
future Jena release :)
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi