[ 
https://issues.apache.org/jira/browse/JENA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182955#comment-15182955
 ] 

ASF GitHub Bot commented on JENA-1147:
--------------------------------------

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/128#discussion_r55197173
  
    --- Diff: 
jena-arq/src/main/java/org/apache/jena/riot/system/FactoryRDFCaching.java ---
    @@ -0,0 +1,110 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.riot.system;
    +
    +import java.util.concurrent.ExecutionException ;
    +
    +import org.apache.jena.ext.com.google.common.cache.Cache ;
    +import org.apache.jena.atlas.lib.cache.CacheInfo ;
    +import org.apache.jena.datatypes.RDFDatatype ;
    +import org.apache.jena.datatypes.xsd.XSDDatatype ;
    +import org.apache.jena.ext.com.google.common.cache.CacheBuilder ;
    +import org.apache.jena.ext.com.google.common.cache.CacheStats ;
    +import org.apache.jena.graph.Node ;
    +import org.apache.jena.riot.RiotException ;
    +import org.apache.jena.riot.lang.LabelToNode ;
    +import org.apache.jena.sparql.graph.NodeConst ;
    +
    +/** Adds some caching of created nodes - the caching is tuned to RIOT 
parser usage */ 
    +public class FactoryRDFCaching extends FactoryRDFStd {
    +    public static final int DftNodeCacheSize = 5000 ; 
    +
    +    // Control the setup - for one thread; start size = 50% of full size, 
no stats
    +    private final Cache<String, Node> cache ;
    +
    +    public FactoryRDFCaching() {
    +        this(DftNodeCacheSize) ;
    +    }
    +    
    +    public FactoryRDFCaching(int cacheSize) {
    +        super() ;
    +        cache = setCache(cacheSize) ;
    +    }
    +
    +    private Cache<String, Node> setCache(int cacheSize) {
    +        return CacheBuilder.newBuilder()
    +            .maximumSize(cacheSize)
    +            .initialCapacity(cacheSize/2)
    +            //.recordStats()
    +            .concurrencyLevel(1)
    +            .build() ;
    +    }
    +
    +    public FactoryRDFCaching(int cacheSize, LabelToNode labelMapping) {
    +        super(labelMapping) ;
    +        cache = setCache(cacheSize) ;
    +    }
    +
    +    @Override
    +    public Node createURI(String uriStr) {
    +        try {
    +            return cache.get(uriStr, ()->RiotLib.createIRIorBNode(uriStr)) 
;
    --- End diff --
    
    `NodeFactory` is fundamental and provide exactly the creation operations 
without opinion (maybe you want to create a real URI that looks like 
`<_:label>` e.g. for output.
    
    `RiotLib.createIRIorBNode` understands the `<_:label>` form. 


> Add a node cache step to RIOT parsing.
> --------------------------------------
>
>                 Key: JENA-1147
>                 URL: https://issues.apache.org/jira/browse/JENA-1147
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: RIOT
>    Affects Versions: Jena 3.0.1
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Minor
>
> A node cache on the parsing pipeline will reduce memory footprint. 
> It may be worth doing different caches for subject/predicate/object as they 
> have different characteristics.
> Care is needed because sometimes the parser is not creating stored object 
> (e.g. TDB loading) so the cache should measurable not add overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to