Le 06/12/11 20:54, Andy Seaborne a écrit :
On 6 December 2011 15:44, Jérôme<[email protected]> wrote:
Thank you Andy,
it was the cost of serializing and deserializing.
My second problem (yes, i have another one ;-) ) is:
By the way - replying to unrelated threads and changing the subject risks
you email not being seen. I, for one, don't always check threads that I'm
not involved in.
Yes, i am sorry. But when i wrote this e-mail, i thought the subject
"fuseki query performance" was appropriate...
The goal of my queries is to find "paragraphs" which are containing
"words" which are matching a regex.
My triplestore stores approximately 1.600.000 triples.
For example: find paragraphs (in my RDF model) containing the word
"example" - here the corresponding query:
PREFIX ram:<...>
PREFIX
rdf:<http://www.w3.org/1999/**02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?Response
WHERE
{
?Response rdf:type<http://www.tei-c.org/ns/1.0#p**> .
?Objet_1
rdf:type<http://prodescartes.greyc.fr/**annotations#word<http://prodescartes.greyc.fr/annotations#word>>
.
?Objet_1 ram:contents ?Objet_1_content .
FILTER regex(?Objet_1_content,"**example") .
?Response ram:contains ?Objet_1 .
}
I get the result in 0.5 seconds
Now, when i'm looking for paragrahs containing "example" and "help":
SELECT ?Response
WHERE
{
?Response rdf:type<http://www.tei-c.org/ns/1.0#p**> .
?Objet_1 rdf:type<http://example.com#word> .
?Objet_1 ram:contents ?Objet_1_content .
FILTER regex(?Objet_1_content,"**example") .
?Response ram:contains ?Objet_1 .
?Objet_2 rdf:type<http://example.com#word> .
?Objet_2 ram:contents ?Objet_2_content .
FILTER regex(?Objet_2_content,"help") .
?Response ram:contains ?Objet_2 .
}
I get the result in...10 minutes. ResultSet is around 50 results.
Why is it so long?
It's doing a cross-product of the results but you're asking the question a
complicated way.
try
SELECT ?Response
WHERE
{
?Response rdf:type<http://www.tei-c.org/ns/1.0#p> .
?Objet_1 rdf:type<http://example.com#word> .
?Objet_1 ram:contents ?Objet_1_content .
FILTER (regex(?Objet_1_content,"example")
&& regex(?Objet_1_content,"work") )
?Response ram:contains ?Objet_1 .
}
I think this query is not correct, because a word can't satisfy
"example" and "work" regexps.
Here a very simplified(much information is missing) example of data:
<paragraph>
[...]
</paragraph>
<paragraph>
<word>
<contents>this</contents>
</word>
<word>
<contents>work</contents>
</word>
<word>
<contents>is</contents>
</word>
<word>
<contents>an</contents>
</work>
<word>
<contents>example</contents>
</work>
</paragraph>
<paragraph>
[...]
</paragraph>
That's why i have to use 2 different objects in my example query: a
paragraph with the word "example" and with the word "work" -
Is not it?
Thank you.
Jérôme
The "funniest" is when i remove constraints on words:
I remove those 2 lines:
?Objet_1 rdf:type<http://example.com#word> .
?Objet_2 rdf:type<http://example.com#word> .
Fuseki answers me faster...
Less work to do.
With cross products in query (two triple patterns not connected by sharing
a variable) there can be a a multiplication of additional work. The
optimizer should have chosen a different strategy but better is to write
the as above.
Thank you.
Jérôme
Andy