[
https://issues.apache.org/jira/browse/JENA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182946#comment-15182946
]
ASF GitHub Bot commented on JENA-1147:
--------------------------------------
Github user ajs6f commented on a diff in the pull request:
https://github.com/apache/jena/pull/128#discussion_r55195827
--- Diff:
jena-arq/src/main/java/org/apache/jena/riot/system/FactoryRDFCaching.java ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.riot.system;
+
+import java.util.concurrent.ExecutionException ;
+
+import org.apache.jena.ext.com.google.common.cache.Cache ;
+import org.apache.jena.atlas.lib.cache.CacheInfo ;
+import org.apache.jena.datatypes.RDFDatatype ;
+import org.apache.jena.datatypes.xsd.XSDDatatype ;
+import org.apache.jena.ext.com.google.common.cache.CacheBuilder ;
+import org.apache.jena.ext.com.google.common.cache.CacheStats ;
+import org.apache.jena.graph.Node ;
+import org.apache.jena.riot.RiotException ;
+import org.apache.jena.riot.lang.LabelToNode ;
+import org.apache.jena.sparql.graph.NodeConst ;
+
+/** Adds some caching of created nodes - the caching is tuned to RIOT
parser usage */
+public class FactoryRDFCaching extends FactoryRDFStd {
+ public static final int DftNodeCacheSize = 5000 ;
+
+ // Control the setup - for one thread; start size = 50% of full size,
no stats
+ private final Cache<String, Node> cache ;
+
+ public FactoryRDFCaching() {
+ this(DftNodeCacheSize) ;
+ }
+
+ public FactoryRDFCaching(int cacheSize) {
+ super() ;
+ cache = setCache(cacheSize) ;
+ }
+
+ private Cache<String, Node> setCache(int cacheSize) {
+ return CacheBuilder.newBuilder()
+ .maximumSize(cacheSize)
+ .initialCapacity(cacheSize/2)
+ //.recordStats()
+ .concurrencyLevel(1)
+ .build() ;
+ }
+
+ public FactoryRDFCaching(int cacheSize, LabelToNode labelMapping) {
+ super(labelMapping) ;
+ cache = setCache(cacheSize) ;
+ }
+
+ @Override
+ public Node createURI(String uriStr) {
+ try {
+ return cache.get(uriStr, ()->RiotLib.createIRIorBNode(uriStr))
;
--- End diff --
This may be a dumb question, but why `RiotLib` here, why not `NodeFactory`?
> Add a node cache step to RIOT parsing.
> --------------------------------------
>
> Key: JENA-1147
> URL: https://issues.apache.org/jira/browse/JENA-1147
> Project: Apache Jena
> Issue Type: Improvement
> Components: RIOT
> Affects Versions: Jena 3.0.1
> Reporter: Andy Seaborne
> Assignee: Andy Seaborne
> Priority: Minor
>
> A node cache on the parsing pipeline will reduce memory footprint.
> It may be worth doing different caches for subject/predicate/object as they
> have different characteristics.
> Care is needed because sometimes the parser is not creating stored object
> (e.g. TDB loading) so the cache should measurable not add overhead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)