On 12/06/14 03:35, DongNing(董宁.阿帕比) wrote:
Thanks Andy!
For more detail on question 2:
If a triples DB such as below--
        S1 :identifier P1
        S2 :identifier P2
        S3 :identifier P3
        S4 :identifier P4
        S5 :identifier P5

        The Count to (var  :identifier  TERM) is 1
        The Count to (var  :identifier  var ) is 5
Is OK?

Yes

But if triples is such as these:
        S1 :identifier P1
        S2 :identifier P1
        S3 :identifier P1
        S4 :identifier P1
        S5 :identifier P1
        
The Count to (var  :identifier  TERM) is 1 or 5?,I think is 5.

5

The Count to (var  :identifier  var ) is 5.
Is OK?
In addition situation -----if triples like these
        S1 :identifier P1
        S2 :identifier P1
        S3 :identifier P2
        S4 :identifier P2
        S5 :identifier P3
The Count to (var  :identifier  TERM) is ?.

Overall points first:

* the optimizer is not trying to find the perfect answer, it's trying to find a reasonable answer, mainly deciding between alternatives. And to some extent its role in life is avoiding the bad as much as finding the good!

* The stats optimizer isn't a perfect scheme (see the RDF3X papers for more discussion) because it only considers triples independent. The stats are an appromixation.

See also the current fixed optimizer.


(var :identifier TERM) .. maybe 2. It's not about exactness; only the first triple gets an exact look up where you could have

(var  :identifier  P1) 2
(var  :identifier  P2) 2
(var  :identifier  P3) 1

It could reorder after every pattern but that might end up with the optimizer costing more then the execution.

        Andy

                                                                                
                        Thank Again!
                                                                                
                                                                                
                                Tony

-----邮件原件-----
发件人: Andy Seaborne [mailto:[email protected]]
发送时间: 2014年6月12日 2:08
收件人: [email protected]
主题: Re: TDB OPTIMIZER question:a puzzled of RULE language about " VAR and TERM "

On 11/06/14 08:03, DongNing(董宁.阿帕比) wrote:
Hi all:

I am a beginner of jena,I am studying at TDB’S optimizer. About
Statistics rule.

1.       I think TERM and VAR’s difference is VAR represent a variant
in sparql. TREM only represent the probable value in the DB, it don’t
represent a variant in sparql.

Is that right?

Yes - TERM means "will be bound at this point"


2.       For a statics graph DB(triples are fixed,do not changed)

Count to (var  :identifier  TERM) and Count to ( Var :identifier var)
should be same?

No.

(var  :identifier  TERM) should be an estimate of what the cardinality when 
there is a specific value.  (var :identifier var) would be count of all uses of 
:identifier.

if :ifp is an inverse function property,

(?x :ifp TERM) is one.


3.       And there are a few explanation and samples on
http://jena.apache.org .Are there any other tutorial about statistics
rule?

Only the code I'm afraid.

        Andy




THANK!

Tony.Dong





Reply via email to