Re: Fuseki query performance

Jérôme Wed, 07 Dec 2011 01:02:14 -0800

Le 06/12/11 20:54, Andy Seaborne a écrit :

On 6 December 2011 15:44, Jérôme<[email protected]>  wrote:

Thank you Andy,

it was the cost of serializing and deserializing.

My second problem (yes, i have another one ;-) ) is:

By the way - replying to unrelated threads and changing the subject risks
you email not being seen.  I, for one, don't always check threads that I'm
not involved in.

Yes, i am sorry. But when i wrote this e-mail, i thought the subject"fuseki query performance" was appropriate...

The goal of my queries is to find "paragraphs" which are containing
"words" which are matching a regex.
My triplestore stores approximately 1.600.000 triples.
For example: find paragraphs (in my RDF model) containing the word
"example" - here the corresponding query:

PREFIX ram:<...>
PREFIX 
rdf:<http://www.w3.org/1999/**02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?Response
WHERE
{
?Response rdf:type<http://www.tei-c.org/ns/1.0#p**>  .
?Objet_1 
rdf:type<http://prodescartes.greyc.fr/**annotations#word<http://prodescartes.greyc.fr/annotations#word>>
.
?Objet_1 ram:contents ?Objet_1_content .
FILTER regex(?Objet_1_content,"**example") .
?Response ram:contains ?Objet_1 .
}

I get the result in 0.5 seconds

Now, when i'm looking for paragrahs containing "example" and "help":

SELECT ?Response
WHERE
{

?Response rdf:type<http://www.tei-c.org/ns/1.0#p**>  .

?Objet_1 rdf:type<http://example.com#word>  .
?Objet_1 ram:contents ?Objet_1_content .
FILTER regex(?Objet_1_content,"**example") .
?Response ram:contains ?Objet_1 .

?Objet_2 rdf:type<http://example.com#word>  .
?Objet_2 ram:contents ?Objet_2_content .
FILTER regex(?Objet_2_content,"help") .
?Response ram:contains ?Objet_2 .

}

I get the result in...10 minutes. ResultSet is around 50 results.

Why is it so long?

It's doing a cross-product of the results but you're asking the question a
complicated way.

try

SELECT ?Response
WHERE
{
   ?Response rdf:type<http://www.tei-c.org/ns/1.0#p>  .
   ?Objet_1 rdf:type<http://example.com#word>  .
   ?Objet_1 ram:contents ?Objet_1_content .
   FILTER (regex(?Objet_1_content,"example")
        &&  regex(?Objet_1_content,"work") )
   ?Response ram:contains ?Objet_1 .
}

I think this query is not correct, because a word can't satisfy"example" and "work" regexps.

Here a very simplified(much information is missing) example of data:
<paragraph>
    [...]
</paragraph>

<paragraph>
<word>
<contents>this</contents>
</word>
<word>
<contents>work</contents>
</word>
<word>
<contents>is</contents>
</word>
<word>
<contents>an</contents>
</work>
<word>
<contents>example</contents>
</work>
</paragraph>

<paragraph>
    [...]
</paragraph>

That's why i have to use 2 different objects in my example query: aparagraph with the word "example" and with the word "work" -

Is not it?

Thank you.
Jérôme

The "funniest" is when i remove constraints on words:
I remove those 2 lines:
?Objet_1 rdf:type<http://example.com#word>  .
?Objet_2 rdf:type<http://example.com#word>  .

Fuseki answers me faster...

Less work to do.

With cross products in query (two triple patterns not connected by sharing
a variable) there can be a a multiplication of additional work.  The
optimizer should have chosen a different strategy but better is to write
the as above.

Thank you.
Jérôme

Andy

Re: Fuseki query performance

Reply via email to