On 12/06/14 14:30, Claude Warren wrote:
Quick question:

Would it make sense to have an immutable flag that would tell the optimizer
(or other processes) that a dataset/model/graph is not likely to change?

   More of a hint rather than a rule.ANALYZE

No point - the stats are assumed to be good enough until invalidated externally. They are not (currently) dynamically maintained or a deamon process could sweep passed and update them.

c.f. PostgreSQL ANALYZE

        Andy







On Thu, Jun 12, 2014 at 2:15 PM, Rob Vesse <[email protected]> wrote:

You may be interested in the following paper -
http://www.csd.uoc.gr/~hy561/papers/storageaccess/optimization/Characterist
ic%20Sets.pdf - on a technique called RDF Characteristic Sets

It tries to solve the problem Andy alludes to that most stats based
optimisers consider triple patterns in isolation of each other rather than
as complete units.  The downside of the RDF Characteristic Sets approach
is that they are potentially very expensive to calculate and would be
awkward to maintain for mutable data sets.

Rob

On 12/06/2014 13:36, "Andy Seaborne" <[email protected]> wrote:

On 12/06/14 03:35, DongNing(董宁.阿帕比) wrote:
Thanks Andy!
For more detail on question 2:
If a triples DB such as below--
      S1 :identifier P1
      S2 :identifier P2
      S3 :identifier P3
      S4 :identifier P4
      S5 :identifier P5

      The Count to (var  :identifier  TERM) is 1
      The Count to (var  :identifier  var ) is 5
Is OK?

Yes

But if triples is such as these:
      S1 :identifier P1
      S2 :identifier P1
      S3 :identifier P1
      S4 :identifier P1
      S5 :identifier P1

The Count to (var  :identifier  TERM) is 1 or 5?,I think is 5.

5

The Count to (var  :identifier  var ) is 5.
Is OK?
In addition situation -----if triples like these
      S1 :identifier P1
      S2 :identifier P1
      S3 :identifier P2
      S4 :identifier P2
      S5 :identifier P3
The Count to (var  :identifier  TERM) is ?.

Overall points first:

* the optimizer is not trying to find the perfect answer, it's trying to
find a reasonable answer, mainly deciding between alternatives.  And to
some extent its role in life is avoiding the bad as much as finding the
good!

* The stats optimizer isn't a perfect scheme (see the RDF3X papers for
more discussion) because it only considers triples independent. The
stats are an appromixation.

See also the current fixed optimizer.


(var  :identifier  TERM) .. maybe 2.  It's not about exactness; only the
first triple gets an exact look up where you could have

(var  :identifier  P1) 2
(var  :identifier  P2) 2
(var  :identifier  P3) 1

It could reorder after every pattern but that might end up with the
optimizer costing more then the execution.

       Andy


                              Thank Again!


                                          Tony

-----邮件原件-----
发件人: Andy Seaborne [mailto:[email protected]]
发送时间: 2014年6月12日 2:08
收件人: [email protected]
主题: Re: TDB OPTIMIZER question:a puzzled of RULE language about " VAR
and TERM "

On 11/06/14 08:03, DongNing(董宁.阿帕比) wrote:
Hi all:

I am a beginner of jena,I am studying at TDB’S optimizer. About
Statistics rule.

1.       I think TERM and VAR’s difference is VAR represent a variant
in sparql. TREM only represent the probable value in the DB, it don’t
represent a variant in sparql.

Is that right?

Yes - TERM means "will be bound at this point"


2.       For a statics graph DB(triples are fixed,do not changed)

Count to (var  :identifier  TERM) and Count to ( Var :identifier var)
should be same?

No.

(var  :identifier  TERM) should be an estimate of what the cardinality
when there is a specific value.  (var :identifier var) would be count of
all uses of :identifier.

if :ifp is an inverse function property,

(?x :ifp TERM) is one.


3.       And there are a few explanation and samples on
http://jena.apache.org .Are there any other tutorial about statistics
rule?

Only the code I'm afraid.

      Andy




THANK!

Tony.Dong













Reply via email to