Hi Bill,

The stats optimizer is preferring the property it knows about to the variable ?p it does not. If you add a stats rule to the file to tell the optimzier about a constant subject triple pattern: e.g. at the end of the stats file you sent me, add a line "((TERM ANY ANY) 10)"

...
  (<http://education.data.gov.uk/...> 24336)
  ((TERM ANY ANY) 10)
  (other 0))

10 is a guess - lower numbers will increase the favour of the
"<a-specific-uri> ?p ?key" part.


Could you let me know if that changes things measurably on the real data?

Maybe this ought to always go in the stats file. (That's needs careful thought because if its wrong, it's potentially a bit nasty.)

        Andy



On 02/04/11 19:58, Andy Seaborne wrote:
Quick answer: longer to follow:

Could you try using the "fixed.opt", removing "stats.opt" and let me
know what happens?

Andy


On 01/04/11 21:17, Bill Roberts wrote:
See below - sorry, realised this message more appropriate for
jena-users than jena-dev

Begin forwarded message:

From: Bill Roberts<[email protected]>
Date: 1 April 2011 19:44:20 GMT+01:00
To: [email protected]
Bcc: Ric Roberts<[email protected]>
Subject: Problems with TDB Optimizer

I've come across some unexpected (to me!) behaviour of the TDB
Optimizer and wondering if someone could shed any light on it.

For our database, (around 30 million triples, 350-odd different
predicates, around 50 named graphs, using UnionDefaultGraph - everything
is in a named graph), we've found that including the stats.opt file
makes some queries significantly slower than having no optimizer.

Some relatively complex queries run quite quickly and probably a
bit
quicker with optimization than without. But in other cases, quite simple
queries run a lot slower - maybe 10 or 20 times slower with stats.opt in
place than they do without it.

Is this known behaviour?

Here's an example:

SELECT ?key ?label WHERE {<a-specific-uri> ?p ?key .
?key<http://www.w3.org/2000/01/rdf-schema#label> ?label }

This query took around 30 seconds with stats.opt in place, and
less
than 2 seconds without it. (Some of that 2 seconds would have been HTTP
transfer and web page rendering time).

We're currently running TDB 0.8.9 and Joseki 3.4.3 on 64 bit
Ubuntu.
(Though I've found similar behaviour on 32-bit Ubuntu with slightly
older versions of TDB and Joseki).

Thanks!

Bill


Reply via email to