Hi,
I sometimes see queries like this one:
SELECT *
WHERE {
...
?event :start ?start .
?event :end ?end .
FILTER (
(?start > "2010-03-27T00:00:00Z"^^xsd:dateTime) &&
(?end < "2010-03-28T00:00:00Z"^^xsd:dateTime)
)
}
The FILTER is scanning the indexes and the node table to find the values
which satisfy the filter expression.
With TDB, if you have a large dataset, this query can be slow.
TDB already encodes certain node values (including DateTime) inline in
the node ids and the good news is that the encoding scheme preserves the
order.
See NodeId's inline(Node node) method:
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/NodeId.java
And, DateTimeNode, for example:
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/DateTimeNode.java
However, I am not sure these in line node ids are at the moment used at
query time. Am I right?
This is probably not a trivial change, but one worth aiming at.
In theory, it should speed up this kind of FILTER expressions and TDB
will be able to answer certain queries without touching the node table
or scanning a large portion of your data just to find a few values.
Another very similar use case is with queries involving locations (i.e.
latitude and longitude). Sometimes you want to find things within a
bounded box, therefore you have a similar expression for latitude values
and one for longitude values.
Is it worth opening a JIRA issue (i.e. a feature request) for this?
Paolo