Agreed.
Primary value of a node cache from my POV is space saving for in-memory
models. But that could indeed be done by ARP (if it isn't already) and
is probably better done at the resource level.
I wouldn't expect any significant effect on the rules engines from
scraping these caches.
Dave
On 20/07/12 12:52, Andy Seaborne wrote:
In JENA-279, the issue of whether the NodeCache serves any useful
purpose these days has come up.
Proposal: Remove the node cache
Proposal: Remove the triple cache
Node cache:
There are two reasons for the cache: time saving (object creation costs)
and space saving (reuse nodes). I'm not sure either of these apply much
nowadays. Java has moved on; parsers should be doing the caching then
the cache is per-run.
TDB does it's own thing because it is caching the node file and the
cache is NodeId to Node.
RIOT, for IRIs, does it's own thing because it is coupled with caching
IRI parsing which is expensive because it's picky.
A quick test: parsing a file:
- - - - - - - - - - - - - -
With node cache:
bsbm-25m.nt.gz : 183.27 sec 25,000,250 triples 136,415.85 TPS
Without node cache:
Node.cache(false) ;
bsbm-25m.nt.gz : 179.19 sec 25,000,250 triples 139,514.99 TPS
- - - - - - - - - - - - - -
so I think that it is better to remove the Node cache and Triple caches
and put reuse of Nodes (space saving, if any) as the responsibility of
the creation code (which is a parser or persistent-to-memory storage
unit typically).
I will check ARP to see what it does (unless anyone can knowns ...)
There are other caches at the Resource level so there some overlap there.
Triple cache:
There is a Triple cache as well although a lot of code goes direct to
new Triple()
But any storage layer already does checking for a triple on insertion so
there is no spacing within one graph. The rules engine has two graphs
so there is not much saving there either. In fact, the cache overhead
is a net cost!
There is no Quad cache.
Andy