On 6 December 2011 15:44, Jérôme <[email protected]> wrote:
> Thank you Andy, > > it was the cost of serializing and deserializing. > > My second problem (yes, i have another one ;-) ) is: > By the way - replying to unrelated threads and changing the subject risks you email not being seen. I, for one, don't always check threads that I'm not involved in. > > The goal of my queries is to find "paragraphs" which are containing > "words" which are matching a regex. > My triplestore stores approximately 1.600.000 triples. > For example: find paragraphs (in my RDF model) containing the word > "example" - here the corresponding query: > > PREFIX ram:<...> > PREFIX > rdf:<http://www.w3.org/1999/**02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > > > SELECT ?Response > WHERE > { > ?Response rdf:type <http://www.tei-c.org/ns/1.0#p**> . > ?Objet_1 rdf:type > <http://prodescartes.greyc.fr/**annotations#word<http://prodescartes.greyc.fr/annotations#word>> > . > ?Objet_1 ram:contents ?Objet_1_content . > FILTER regex(?Objet_1_content,"**example") . > ?Response ram:contains ?Objet_1 . > } > > I get the result in 0.5 seconds > > Now, when i'm looking for paragrahs containing "example" and "help": > > SELECT ?Response > WHERE > { > > ?Response rdf:type <http://www.tei-c.org/ns/1.0#p**> . > > ?Objet_1 rdf:type <http://example.com#word> . > ?Objet_1 ram:contents ?Objet_1_content . > FILTER regex(?Objet_1_content,"**example") . > ?Response ram:contains ?Objet_1 . > > ?Objet_2 rdf:type <http://example.com#word> . > ?Objet_2 ram:contents ?Objet_2_content . > FILTER regex(?Objet_2_content,"help") . > ?Response ram:contains ?Objet_2 . > > } > > I get the result in...10 minutes. ResultSet is around 50 results. > > Why is it so long? > It's doing a cross-product of the results but you're asking the question a complicated way. try SELECT ?Response WHERE { ?Response rdf:type <http://www.tei-c.org/ns/1.0#p> . ?Objet_1 rdf:type <http://example.com#word> . ?Objet_1 ram:contents ?Objet_1_content . FILTER (regex(?Objet_1_content,"example") && regex(?Objet_1_content,"work") ) ?Response ram:contains ?Objet_1 . } > > The "funniest" is when i remove constraints on words: > I remove those 2 lines: > ?Objet_1 rdf:type <http://example.com#word> . > ?Objet_2 rdf:type <http://example.com#word> . > > Fuseki answers me faster... > Less work to do. With cross products in query (two triple patterns not connected by sharing a variable) there can be a a multiplication of additional work. The optimizer should have chosen a different strategy but better is to write the as above. > > Thank you. > Jérôme > Andy
