[ 
https://issues.apache.org/jira/browse/JENA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233140#comment-17233140
 ] 

Andy Seaborne commented on JENA-1988:
-------------------------------------

The node tabel to two way 

Node -> NodeId

NodeId -> Node

The first is splitable by node kind but the reverse isn't - the nodeid does not 
have enough information on its own to identity the node type in all cases

The cache is an LRU cache provided by Google Guava (Jena shades the artifact to 
avoid version clashes).

The cache is a hash table so lookup is O(1), I believe . I suggest looking 
careful at whether splitting it will have a measurable advantage.

I haven't seen that node access is a significant cost at query time but maybe 
you have a use case where it is.

At load time, the node table can be a cost-point - just shifting the bytes of 
all the RDF terms can be significant.

 

 

 

> Separating B+ tree into different Node representations.
> -------------------------------------------------------
>
>                 Key: JENA-1988
>                 URL: https://issues.apache.org/jira/browse/JENA-1988
>             Project: Apache Jena
>          Issue Type: Question
>          Components: TDB
>    Affects Versions: Jena 3.16.0
>            Reporter: Martin Pekár
>            Priority: Major
>              Labels: features, newbie, performance, test
>             Fix For: Jena 3.16.0
>
>         Attachments: NodeTableNative.java
>
>
> In a project to optimize the indexing, I am trying to have 4 indexes, one for 
> each Node type (variable, literal, URI and blank). To implement this, I added 
> 4 copies of the _nodeHashToId_ Index instance in the _NodeTableNative_ class. 
> Then, for every operation on the _nodeHashToId_, for example using 
> _containsNode()_ in the NodeTableNative class, I first check the type of Node 
> given as parameter and then check for existence in the appropriate 
> _nodeHashToId_ copy.
> Now, for some reason I get a NullPointerException when running the tests. 
> Many of these exceptions appear in the _BufferChannelFile_ class in the 
> _size()_ method because the call to _file.channel()_ return null.
> My question is then, is _NodeTableNative___ even the right place to implement 
> this optimization, and second, if it is the right place to implement, can you 
> help me understand why this exception is thrown?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to