On 12/06/14 03:35, DongNing(董宁.阿帕比) wrote:
Thanks Andy!
For more detail on question 2:
If a triples DB such as below--
S1 :identifier P1
S2 :identifier P2
S3 :identifier P3
S4 :identifier P4
S5 :identifier P5
The Count to (var :identifier TERM) is 1
The Count to (var :identifier var ) is 5
Is OK?
Yes
But if triples is such as these:
S1 :identifier P1
S2 :identifier P1
S3 :identifier P1
S4 :identifier P1
S5 :identifier P1
The Count to (var :identifier TERM) is 1 or 5?,I think is 5.
5
The Count to (var :identifier var ) is 5.
Is OK?
In addition situation -----if triples like these
S1 :identifier P1
S2 :identifier P1
S3 :identifier P2
S4 :identifier P2
S5 :identifier P3
The Count to (var :identifier TERM) is ?.
Overall points first:
* the optimizer is not trying to find the perfect answer, it's trying to
find a reasonable answer, mainly deciding between alternatives. And to
some extent its role in life is avoiding the bad as much as finding the
good!
* The stats optimizer isn't a perfect scheme (see the RDF3X papers for
more discussion) because it only considers triples independent. The
stats are an appromixation.
See also the current fixed optimizer.
(var :identifier TERM) .. maybe 2. It's not about exactness; only the
first triple gets an exact look up where you could have
(var :identifier P1) 2
(var :identifier P2) 2
(var :identifier P3) 1
It could reorder after every pattern but that might end up with the
optimizer costing more then the execution.
Andy
Thank Again!
Tony
-----邮件原件-----
发件人: Andy Seaborne [mailto:[email protected]]
发送时间: 2014年6月12日 2:08
收件人: [email protected]
主题: Re: TDB OPTIMIZER question:a puzzled of RULE language about " VAR and TERM "
On 11/06/14 08:03, DongNing(董宁.阿帕比) wrote:
Hi all:
I am a beginner of jena,I am studying at TDB’S optimizer. About
Statistics rule.
1. I think TERM and VAR’s difference is VAR represent a variant
in sparql. TREM only represent the probable value in the DB, it don’t
represent a variant in sparql.
Is that right?
Yes - TERM means "will be bound at this point"
2. For a statics graph DB(triples are fixed,do not changed)
Count to (var :identifier TERM) and Count to ( Var :identifier var)
should be same?
No.
(var :identifier TERM) should be an estimate of what the cardinality when
there is a specific value. (var :identifier var) would be count of all uses of
:identifier.
if :ifp is an inverse function property,
(?x :ifp TERM) is one.
3. And there are a few explanation and samples on
http://jena.apache.org .Are there any other tutorial about statistics
rule?
Only the code I'm afraid.
Andy
THANK!
Tony.Dong