NodeCache : keep or remove?

Andy Seaborne Fri, 20 Jul 2012 04:53:00 -0700

In JENA-279, the issue of whether the NodeCache serves any usefulpurpose these days has come up.


Proposal: Remove the node cache
Proposal: Remove the triple cache


Node cache:

There are two reasons for the cache: time saving (object creation costs)and space saving (reuse nodes). I'm not sure either of these apply muchnowadays. Java has moved on; parsers should be doing the caching thenthe cache is per-run.

TDB does it's own thing because it is caching the node file and thecache is NodeId to Node.

RIOT, for IRIs, does it's own thing because it is coupled with cachingIRI parsing which is expensive because it's picky.


A quick test: parsing a file:
- - - - - - - - - - - - - -
With node cache:
bsbm-25m.nt.gz : 183.27 sec  25,000,250 triples  136,415.85 TPS

Without node cache:
Node.cache(false) ;
bsbm-25m.nt.gz : 179.19 sec  25,000,250 triples  139,514.99 TPS
- - - - - - - - - - - - - -

so I think that it is better to remove the Node cache and Triple cachesand put reuse of Nodes (space saving, if any) as the responsibility ofthe creation code (which is a parser or persistent-to-memory storageunit typically).


I will check ARP to see what it does (unless anyone can knowns ...)

There are other caches at the Resource level so there some overlap there.

Triple cache:

There is a Triple cache as well although a lot of code goes direct tonew Triple()

But any storage layer already does checking for a triple on insertion sothere is no spacing within one graph. The rules engine has two graphsso there is not much saving there either. In fact, the cache overheadis a net cost!


There is no Quad cache.

        Andy

NodeCache : keep or remove?

Reply via email to