[jira] [Commented] (JENA-1147) Add a node cache step to RIOT parsing.

Andy Seaborne (JIRA) Tue, 01 Mar 2016 05:40:10 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173760#comment-15173760
 ]


Andy Seaborne commented on JENA-1147:
-------------------------------------

A small prototype on some small data reading into memory. The cache is {{String 
-> Node}} and at 10K slots, has a hit rate of ~90% for the NPG data.

{noformat}
chebi (1.1e6 triples) :: 630MB -> 308MB

bsbm-1m :: 832MB -> 376MB

(Nature Publishing Group data, 11.1e6 quads)  
npg-contributors-dataset.nq :: 5569 MB -> 2377 MB
{noformat}

Roughly, the cache is

{{Node n = cache.getOrFill(uriStr, ()-> create_node_from_string(uriStr)) ;}}

So far, the effect on parsing speed seems to be quite small, sometimes even 
slightly faster.

> Add a node cache step to RIOT parsing.
> --------------------------------------
>
>                 Key: JENA-1147
>                 URL: https://issues.apache.org/jira/browse/JENA-1147
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: RIOT
>    Affects Versions: Jena 3.0.1
>            Reporter: Andy Seaborne
>            Priority: Minor
>
> A node cache on the parsing pipeline will reduce memory footprint. 
> It may be worth doing different caches for subject/predicate/object as they 
> have different characteristics.
> Care is needed because sometimes the parser is not creating stored object 
> (e.g. TDB loading) so the cache should measurable not add overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-1147) Add a node cache step to RIOT parsing.

Reply via email to