You may be interested in the following paper -
http://www.csd.uoc.gr/~hy561/papers/storageaccess/optimization/Characterist
ic%20Sets.pdf - on a technique called RDF Characteristic Sets

It tries to solve the problem Andy alludes to that most stats based
optimisers consider triple patterns in isolation of each other rather than
as complete units.  The downside of the RDF Characteristic Sets approach
is that they are potentially very expensive to calculate and would be
awkward to maintain for mutable data sets.

Rob

On 12/06/2014 13:36, "Andy Seaborne" <[email protected]> wrote:

>On 12/06/14 03:35, DongNing(董宁.阿帕比) wrote:
>> Thanks Andy!
>> For more detail on question 2:
>> If a triples DB such as below--
>>      S1 :identifier P1
>>      S2 :identifier P2
>>      S3 :identifier P3
>>      S4 :identifier P4
>>      S5 :identifier P5
>>
>>      The Count to (var  :identifier  TERM) is 1
>>      The Count to (var  :identifier  var ) is 5
>> Is OK?
>
>Yes
>
>> But if triples is such as these:
>>      S1 :identifier P1
>>      S2 :identifier P1
>>      S3 :identifier P1
>>      S4 :identifier P1
>>      S5 :identifier P1
>>
>> The Count to (var  :identifier  TERM) is 1 or 5?,I think is 5.
>
>5
>
>> The Count to (var  :identifier  var ) is 5.
>> Is OK?
>> In addition situation -----if triples like these
>>      S1 :identifier P1
>>      S2 :identifier P1
>>      S3 :identifier P2
>>      S4 :identifier P2
>>      S5 :identifier P3
>> The Count to (var  :identifier  TERM) is ?.
>
>Overall points first:
>
>* the optimizer is not trying to find the perfect answer, it's trying to
>find a reasonable answer, mainly deciding between alternatives.  And to
>some extent its role in life is avoiding the bad as much as finding the
>good!
>
>* The stats optimizer isn't a perfect scheme (see the RDF3X papers for
>more discussion) because it only considers triples independent. The
>stats are an appromixation.
>
>See also the current fixed optimizer.
>
>
>(var  :identifier  TERM) .. maybe 2.  It's not about exactness; only the
>first triple gets an exact look up where you could have
>
>(var  :identifier  P1) 2
>(var  :identifier  P2) 2
>(var  :identifier  P3) 1
>
>It could reorder after every pattern but that might end up with the
>optimizer costing more then the execution.
>
>       Andy
>
>>                                                                              
>>                         Thank Again!
>>                                                                              
>>                                                                              
>>                                    Tony
>>
>> -----邮件原件-----
>> 发件人: Andy Seaborne [mailto:[email protected]]
>> 发送时间: 2014年6月12日 2:08
>> 收件人: [email protected]
>> 主题: Re: TDB OPTIMIZER question:a puzzled of RULE language about " VAR
>>and TERM "
>>
>> On 11/06/14 08:03, DongNing(董宁.阿帕比) wrote:
>>> Hi all:
>>>
>>> I am a beginner of jena,I am studying at TDB’S optimizer. About
>>> Statistics rule.
>>>
>>> 1.       I think TERM and VAR’s difference is VAR represent a variant
>>> in sparql. TREM only represent the probable value in the DB, it don’t
>>> represent a variant in sparql.
>>>
>>> Is that right?
>>
>> Yes - TERM means "will be bound at this point"
>>
>>>
>>> 2.       For a statics graph DB(triples are fixed,do not changed)
>>>
>>> Count to (var  :identifier  TERM) and Count to ( Var :identifier var)
>>> should be same?
>>
>> No.
>>
>> (var  :identifier  TERM) should be an estimate of what the cardinality
>>when there is a specific value.  (var :identifier var) would be count of
>>all uses of :identifier.
>>
>> if :ifp is an inverse function property,
>>
>> (?x :ifp TERM) is one.
>>
>>>
>>> 3.       And there are a few explanation and samples on
>>> http://jena.apache.org .Are there any other tutorial about statistics
>>> rule?
>>
>> Only the code I'm afraid.
>>
>>      Andy
>>
>>>
>>>
>>>
>>> THANK!
>>>
>>> Tony.Dong
>>>
>>>
>>
>>
>




Reply via email to