Andy

Sorry for slow response from me.  I tried adding the line ((TERM ANY ANY) 10) 
to the end of the stats.opt file and that increased the speed of that query to 
about the same as with no optimiser.  So thanks very much for that - it's kind 
of solved the problem, but I need to do some more tests on a broader range of 
queries, to find the cases where the optimiser is actively helping (as opposed 
to no longer slowing things down!)

Presumably it's cases where there are multiple clauses in the query and the 
order of evaluating them is significant.

Anyway, thanks a lot for your help.

Cheers

Bill


On 4 Apr 2011, at 15:47, Andy Seaborne wrote:

> Hi Bill,
> 
> The stats optimizer is preferring the property it knows about to the variable 
> ?p it does not.  If you add a stats rule to the file to tell the optimzier 
> about a constant subject triple pattern: e.g. at the end of the stats file 
> you sent me, add a line "((TERM ANY ANY) 10)"
> 
> ...
>  (<http://education.data.gov.uk/...> 24336)
>  ((TERM ANY ANY) 10)
>  (other 0))
> 
> 10 is a guess - lower numbers will increase the favour of the
> "<a-specific-uri> ?p ?key" part.
> 
> 
> Could you let me know if that changes things measurably on the real data?
> 
> Maybe this ought to always go in the stats file.  (That's needs careful 
> thought because if its wrong, it's potentially a bit nasty.)
> 
>       Andy
> 
> 
> 
> On 02/04/11 19:58, Andy Seaborne wrote:
>> Quick answer: longer to follow:
>> 
>> Could you try using the "fixed.opt", removing "stats.opt" and let me
>> know what happens?
>> 
>> Andy
>> 
>> 
>> On 01/04/11 21:17, Bill Roberts wrote:
>>> See below - sorry, realised this message more appropriate for
>>> jena-users than jena-dev
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: Bill Roberts<[email protected]>
>>>> Date: 1 April 2011 19:44:20 GMT+01:00
>>>> To: [email protected]
>>>> Bcc: Ric Roberts<[email protected]>
>>>> Subject: Problems with TDB Optimizer
>>>> 
>>>> I've come across some unexpected (to me!) behaviour of the TDB
>> Optimizer and wondering if someone could shed any light on it.
>>>> 
>>>> For our database, (around 30 million triples, 350-odd different
>> predicates, around 50 named graphs, using UnionDefaultGraph - everything
>> is in a named graph), we've found that including the stats.opt file
>> makes some queries significantly slower than having no optimizer.
>>>> 
>>>> Some relatively complex queries run quite quickly and probably a
>>>> bit
>> quicker with optimization than without. But in other cases, quite simple
>> queries run a lot slower - maybe 10 or 20 times slower with stats.opt in
>> place than they do without it.
>>>> 
>>>> Is this known behaviour?
>>>> 
>>>> Here's an example:
>>>> 
>>>> SELECT ?key ?label WHERE {<a-specific-uri> ?p ?key .
>> ?key<http://www.w3.org/2000/01/rdf-schema#label> ?label }
>>>> 
>>>> This query took around 30 seconds with stats.opt in place, and
>>>> less
>> than 2 seconds without it. (Some of that 2 seconds would have been HTTP
>> transfer and web page rendering time).
>>>> 
>>>> We're currently running TDB 0.8.9 and Joseki 3.4.3 on 64 bit
>>>> Ubuntu.
>> (Though I've found similar behaviour on 32-bit Ubuntu with slightly
>> older versions of TDB and Joseki).
>>>> 
>>>> Thanks!
>>>> 
>>>> Bill
>>> 
>>> 

Reply via email to