Paolo Castagna wrote:
> Hi,
> in the last days I made some experiments on different (hopefully more
> scalable,
> in particular on machines with RAM constraints) ways to generate TDB
> indexes.
> These improvements could be beneficial for tdbloader2 or a pure Java
> version
> of it (see: [1]). One specific thing, in particular, is necessary to
> complete
> tdbloader3 (i.e. a MapReduce implementation of a TDB loader).
> 
> This email focuses on the node table only and more precisely on the B+Tree
> index of the node table. Such index has records with keys of 128 bit, which
> represent the hash of RDF node values, and values of 68 bit, which
> represent
> the corresponding node ids. This index is used to, given an RDF node,
> retrieve
> its node id. This is used to replace RDF node values before executing a
> query
> (since querie use indexes with node ids only in it).
> 
> I'd like to be able to use the same technique used by tdbloader2 on the
> final
> stage for the SPO, POS, OSP, GSPO, GPOS, etc. B+Tree indexes to build the
> B+Tree index of the node table (see: [2]).
> 
> I know how to generate and sort a file containing hash|id, see [3] for
> example.
> 
> However, I don't think the current BPlusTreeRewriter can be used as it is
> to rebuild a B+Tree index from such a file. I think the main reason is
> because it uses createKeyOnly().
> 
> Is that the only obstacle or it's much more complicate than that?
> 
> Is it possible to change/adapt/extend BPlusTreeRewriter to support this use
> case as well?

Well, I was wrong: BPlusTreeRewriter works with Records with values as well.

Here:
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/NodeTableRewriter.java
https://github.com/castagna/tdbloader3/blob/master/src/test/java/org/apache/jena/tdbloader3/TestNodeTableRewriter.java
https://github.com/castagna/tdbloader3/blob/master/src/main/java/cmd/nodetablebuilder.java

This can helps JENA-117 (i.e. a pure Java version of tdbloader2).
More tests are necessary to establish if that would be faster than the current 
one.

Paolo

> 
> Thanks,
> Paolo
> 
>  [1] https://issues.apache.org/jira/browse/JENA-117
>  [2] http://seaborne.blogspot.com/2010/12/repacking-btrees.html
>  [3]
> https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/NodeTableBuilder.java#L97
> 

Reply via email to