Quick question: Would it make sense to have an immutable flag that would tell the optimizer (or other processes) that a dataset/model/graph is not likely to change?
More of a hint rather than a rule. On Thu, Jun 12, 2014 at 2:15 PM, Rob Vesse <[email protected]> wrote: > You may be interested in the following paper - > http://www.csd.uoc.gr/~hy561/papers/storageaccess/optimization/Characterist > ic%20Sets.pdf - on a technique called RDF Characteristic Sets > > It tries to solve the problem Andy alludes to that most stats based > optimisers consider triple patterns in isolation of each other rather than > as complete units. The downside of the RDF Characteristic Sets approach > is that they are potentially very expensive to calculate and would be > awkward to maintain for mutable data sets. > > Rob > > On 12/06/2014 13:36, "Andy Seaborne" <[email protected]> wrote: > > >On 12/06/14 03:35, DongNing(董宁.阿帕比) wrote: > >> Thanks Andy! > >> For more detail on question 2: > >> If a triples DB such as below-- > >> S1 :identifier P1 > >> S2 :identifier P2 > >> S3 :identifier P3 > >> S4 :identifier P4 > >> S5 :identifier P5 > >> > >> The Count to (var :identifier TERM) is 1 > >> The Count to (var :identifier var ) is 5 > >> Is OK? > > > >Yes > > > >> But if triples is such as these: > >> S1 :identifier P1 > >> S2 :identifier P1 > >> S3 :identifier P1 > >> S4 :identifier P1 > >> S5 :identifier P1 > >> > >> The Count to (var :identifier TERM) is 1 or 5?,I think is 5. > > > >5 > > > >> The Count to (var :identifier var ) is 5. > >> Is OK? > >> In addition situation -----if triples like these > >> S1 :identifier P1 > >> S2 :identifier P1 > >> S3 :identifier P2 > >> S4 :identifier P2 > >> S5 :identifier P3 > >> The Count to (var :identifier TERM) is ?. > > > >Overall points first: > > > >* the optimizer is not trying to find the perfect answer, it's trying to > >find a reasonable answer, mainly deciding between alternatives. And to > >some extent its role in life is avoiding the bad as much as finding the > >good! > > > >* The stats optimizer isn't a perfect scheme (see the RDF3X papers for > >more discussion) because it only considers triples independent. The > >stats are an appromixation. > > > >See also the current fixed optimizer. > > > > > >(var :identifier TERM) .. maybe 2. It's not about exactness; only the > >first triple gets an exact look up where you could have > > > >(var :identifier P1) 2 > >(var :identifier P2) 2 > >(var :identifier P3) 1 > > > >It could reorder after every pattern but that might end up with the > >optimizer costing more then the execution. > > > > Andy > > > >> > Thank Again! > >> > > Tony > >> > >> -----邮件原件----- > >> 发件人: Andy Seaborne [mailto:[email protected]] > >> 发送时间: 2014年6月12日 2:08 > >> 收件人: [email protected] > >> 主题: Re: TDB OPTIMIZER question:a puzzled of RULE language about " VAR > >>and TERM " > >> > >> On 11/06/14 08:03, DongNing(董宁.阿帕比) wrote: > >>> Hi all: > >>> > >>> I am a beginner of jena,I am studying at TDB’S optimizer. About > >>> Statistics rule. > >>> > >>> 1. I think TERM and VAR’s difference is VAR represent a variant > >>> in sparql. TREM only represent the probable value in the DB, it don’t > >>> represent a variant in sparql. > >>> > >>> Is that right? > >> > >> Yes - TERM means "will be bound at this point" > >> > >>> > >>> 2. For a statics graph DB(triples are fixed,do not changed) > >>> > >>> Count to (var :identifier TERM) and Count to ( Var :identifier var) > >>> should be same? > >> > >> No. > >> > >> (var :identifier TERM) should be an estimate of what the cardinality > >>when there is a specific value. (var :identifier var) would be count of > >>all uses of :identifier. > >> > >> if :ifp is an inverse function property, > >> > >> (?x :ifp TERM) is one. > >> > >>> > >>> 3. And there are a few explanation and samples on > >>> http://jena.apache.org .Are there any other tutorial about statistics > >>> rule? > >> > >> Only the code I'm afraid. > >> > >> Andy > >> > >>> > >>> > >>> > >>> THANK! > >>> > >>> Tony.Dong > >>> > >>> > >> > >> > > > > > > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
