[
https://issues.apache.org/jira/browse/JENA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Seaborne updated JENA-813:
-------------------------------
Comment: was deleted
(was: Try this, switching {{IteratorCons}} to {{IteratorConcat}}
{code:title=DatasetGraphCollection.java|borderStyle=solid}
@Override
protected Iterator<Quad> findInAnyNamedGraphs(Node s, Node p, Node o)
{
Iterator<Node> gnames = listGraphNodes() ;
IteratorConcat<Quad> iter = new IteratorConcat<>() ;
// Named graphs
for ( ; gnames.hasNext() ; )
{
Node gn = gnames.next();
Iterator<Quad> qIter = findInSpecificNamedGraph(gn, s, p, o) ;
if ( qIter != null )
// copes with null for iter
//iter = Iter.append(iter, qIter) ;
iter.add(qIter) ;
}
return iter ;
}
{code})
> In-memory DatasetGraph is extremely slow to iterate over when lots of named
> graphs are used
> -------------------------------------------------------------------------------------------
>
> Key: JENA-813
> URL: https://issues.apache.org/jira/browse/JENA-813
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 2.12.1
> Reporter: Rob Vesse
> Attachments: SlowDatasetGraph.java
>
>
> Discovered this accidentally in the course of debugging a test case which
> seemed to running exceptionally slow. Basically the in-memory
> {{DatasetGraph}} implementation will perform extremely poorly when trying to
> iterate all quads it contains.
> This problem manifests at relatively small scales and shows a marked
> difference between small numbers of quads in a single graph versus the quads
> in individual graphs. In my testing with only a few thousand quads placing
> them in a single graph versus many individual graphs can see 5/6 orders of
> magnitude performance difference.
> I will attach code to reproduce and show my log output from running this on
> my system (Mac OS X Mavericks - Quad Core - 8GB RAM):
> {noformat}
> 0 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 1000 quads
> 31 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 44 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads
> in 1 graphs
> 64 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.009692
> seconds to iterate over 1000 quads in 1 graphs
> 64 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 85 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads
> in 1000 graphs
> 779 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.693548
> seconds to iterate over 1000 quads in 1000 graphs
> 779 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 2000 quads
> 779 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 788 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads
> in 1 graphs
> 788 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.000047
> seconds to iterate over 2000 quads in 1 graphs
> 788 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 821 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads
> in 2000 graphs
> 7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 6.914044
> seconds to iterate over 2000 quads in 2000 graphs
> 7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 3000 quads
> 7735 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads
> in 1 graphs
> 7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.000032
> seconds to iterate over 3000 quads in 1 graphs
> 7743 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 7767 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads
> in 3000 graphs
> 36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 28.465885 seconds to iterate over 3000 quads in 3000 graphs
> 36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 4000 quads
> 36233 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 36234 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads
> in 1 graphs
> 36235 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 0.000033 seconds to iterate over 4000 quads in 1 graphs
> 36235 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 36253 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads
> in 4000 graphs
> 109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 73.393265 seconds to iterate over 4000 quads in 4000 graphs
> 109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing
> with 5000 quads
> 109646 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 5000 quads in 1 graphs
> 109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 0.000066 seconds to iterate over 5000 quads in 1 graphs
> 109648 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 109654 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 5000 quads in 5000 graphs
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 140.212082 seconds to iterate over 5000 quads in 5000 graphs
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing
> with 6000 quads
> 249867 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 6000 quads in 1 graphs
> 249868 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 0.000049 seconds to iterate over 6000 quads in 1 graphs
> 249868 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 249870 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 6000 quads in 6000 graphs
> 525942 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 276.071471 seconds to iterate over 6000 quads in 6000 graphs
> 525942 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing
> with 7000 quads
> 525943 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 525944 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 7000 quads in 1 graphs
> 525944 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 0.000023 seconds to iterate over 7000 quads in 1 graphs
> 525945 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 525950 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 7000 quads in 7000 graphs
> 954218 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 428.267270 seconds to iterate over 7000 quads in 7000 graphs
> 954218 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing
> with 8000 quads
> 954218 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 954219 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 8000 quads in 1 graphs
> 954219 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 0.000010 seconds to iterate over 8000 quads in 1 graphs
> 954219 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 954222 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 8000 quads in 8000 graphs
> 243342 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 289.119560 seconds to iterate over 8000 quads in 8000 graphs
> {noformat}
> From debugging my best guess is this is down to how
> {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by
> iterating over the graph names and then iterating over the individual graphs
> appending the iterators together. Something in the structuring means
> accessing and appending all those individual iterators is drastically slow
> compared to iterating a single graph.
> A related bug is that given a sufficiently large number of graphs attempts to
> iterate will fail with a {{StackOverflowError}}, the culprit in this case is
> {{IteratorCons}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)