An optimimsation for queries with FILTER ((?date > "..."^^xsd:dateTime) && 
(?date < "..."^^xsd:dateTime)) 
----------------------------------------------------------------------------------------------------------

                 Key: JENA-144
                 URL: https://issues.apache.org/jira/browse/JENA-144
             Project: Jena
          Issue Type: Improvement
          Components: TDB
            Reporter: Paolo Castagna


When TDB index literal values, if possible, it encodes the literal value 
directly into the NodeId. 
See NodeId.inline(Node node) method:
http://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/NodeId.java
At query time, since there isn't an entry in the node table for values encoded 
in this way, there is no need to perform lookups on the node table.

Let's consider this query pattern:

    ?s <http://purl.org/dc/elements/1.1/date> ?date .
    FILTER ( ( ?date > "2011-06-06T00:00:00Z"^^xsd:dateTime ) &&
             ( ?date < "2011-06-07T00:00:00Z"^^xsd:dateTime ) )

In this case the POS index will be used, doing a partial scan with a fixed P: 
[(P,0,0), (P+1,0,0)) where P is the NodeId corresponding to property used in 
the BGP (i.e. <http://purl.org/dc/elements/1.1/date> in the example above).
However, if there are many subjects with a date, the filter expression needs to 
be evaluated for all the date values. Even if those date values came straight 
out of the POS index and not from the node table, this can take a while.

We could have a better range index scan which starts at a particular value 
(i.e. "2011-06-06T00:00:00Z"^^xsd:dateTime, from the example above). The range 
index scan could be: [(P,D1,0), (P,D2,0)) where D1 and D2 are the NodeId 
corresponding to the values specified in the FILTER expression.

It is also not clear how the optimizer could decide if this will be more 
selective than other triple patterns.

See a couple of thread on jena-dev and jena-users mailing lists related to this:

 - http://markmail.org/thread/czopj5de3w62aacn
 - http://markmail.org/thread/pfwl6ukbpqfw23r6

(Or, maybe, this sort of optimisation is too specific, overly complicated... 
and a caching layer would solve this and many other performance related issues! 
;-))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to