Hi Bill,The stats optimizer is preferring the property it knows about to the variable ?p it does not. If you add a stats rule to the file to tell the optimzier about a constant subject triple pattern: e.g. at the end of the stats file you sent me, add a line "((TERM ANY ANY) 10)"
... (<http://education.data.gov.uk/...> 24336) ((TERM ANY ANY) 10) (other 0)) 10 is a guess - lower numbers will increase the favour of the "<a-specific-uri> ?p ?key" part. Could you let me know if that changes things measurably on the real data?Maybe this ought to always go in the stats file. (That's needs careful thought because if its wrong, it's potentially a bit nasty.)
Andy
On 02/04/11 19:58, Andy Seaborne wrote:
Quick answer: longer to follow: Could you try using the "fixed.opt", removing "stats.opt" and let me know what happens? Andy On 01/04/11 21:17, Bill Roberts wrote:See below - sorry, realised this message more appropriate for jena-users than jena-dev Begin forwarded message:From: Bill Roberts<[email protected]> Date: 1 April 2011 19:44:20 GMT+01:00 To: [email protected] Bcc: Ric Roberts<[email protected]> Subject: Problems with TDB Optimizer I've come across some unexpected (to me!) behaviour of the TDBOptimizer and wondering if someone could shed any light on it.For our database, (around 30 million triples, 350-odd differentpredicates, around 50 named graphs, using UnionDefaultGraph - everything is in a named graph), we've found that including the stats.opt file makes some queries significantly slower than having no optimizer.Some relatively complex queries run quite quickly and probably a bitquicker with optimization than without. But in other cases, quite simple queries run a lot slower - maybe 10 or 20 times slower with stats.opt in place than they do without it.Is this known behaviour? Here's an example: SELECT ?key ?label WHERE {<a-specific-uri> ?p ?key .?key<http://www.w3.org/2000/01/rdf-schema#label> ?label }This query took around 30 seconds with stats.opt in place, and lessthan 2 seconds without it. (Some of that 2 seconds would have been HTTP transfer and web page rendering time).We're currently running TDB 0.8.9 and Joseki 3.4.3 on 64 bit Ubuntu.(Though I've found similar behaviour on 32-bit Ubuntu with slightly older versions of TDB and Joseki).Thanks! Bill
