[
https://issues.apache.org/jira/browse/JENA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206224#comment-14206224
]
Andy Seaborne edited comment on JENA-813 at 11/11/14 10:01 AM:
---------------------------------------------------------------
With the new test case, I get
{noformat}
458 [main] INFO dev.SlowDatasetGraph - Took 0.002670 seconds to iterate over
10000 quads in 1 graphs
482 [main] INFO dev.SlowDatasetGraph - Took 0.013079 seconds to iterate over
10000 quads in 10000 graphs
{noformat}
with the fix above. Seems reasonable for the general purpose dataset
implementation.
Cleaned full output:
{noformat}
0 [main] - Testing with 1000 quads
41 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1 graphs
61 [main] - Took 0.012906 seconds to iterate over 1000 quads in 1 graphs
67 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1000 graphs
79 [main] - Took 0.011935 seconds to iterate over 1000 quads in 1000 graphs
79 [main] - Testing with 2000 quads
84 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 1 graphs
85 [main] - Took 0.001112 seconds to iterate over 2000 quads in 1 graphs
90 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 2000 graphs
100 [main] - Took 0.009100 seconds to iterate over 2000 quads in 2000 graphs
100 [main] - Testing with 3000 quads
107 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 1 graphs
109 [main] - Took 0.001506 seconds to iterate over 3000 quads in 1 graphs
124 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 3000 graphs
129 [main] - Took 0.004989 seconds to iterate over 3000 quads in 3000 graphs
130 [main] - Testing with 4000 quads
139 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 1 graphs
141 [main] - Took 0.001675 seconds to iterate over 4000 quads in 1 graphs
149 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 4000 graphs
157 [main] - Took 0.006983 seconds to iterate over 4000 quads in 4000 graphs
157 [main] - Testing with 5000 quads
170 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 1 graphs
172 [main] - Took 0.002355 seconds to iterate over 5000 quads in 1 graphs
193 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 5000 graphs
202 [main] - Took 0.008339 seconds to iterate over 5000 quads in 5000 graphs
202 [main] - Testing with 6000 quads
220 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 6000 quads in 1 graphs
232 [main] - Took 0.002339 seconds to iterate over 6000 quads in 1 graphs
258 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 6000 quads in 6000 graphs
303 [main] - Took 0.044652 seconds to iterate over 6000 quads in 6000 graphs
303 [main] - Testing with 7000 quads
317 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 7000 quads in 1 graphs
326 [main] - Took 0.008772 seconds to iterate over 7000 quads in 1 graphs
336 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 7000 quads in 7000 graphs
364 [main] - Took 0.027352 seconds to iterate over 7000 quads in 7000 graphs
364 [main] - Testing with 8000 quads
375 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 8000 quads in 1 graphs
377 [main] - Took 0.002378 seconds to iterate over 8000 quads in 1 graphs
389 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 8000 quads in 8000 graphs
400 [main] - Took 0.010900 seconds to iterate over 8000 quads in 8000 graphs
401 [main] - Testing with 9000 quads
412 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 9000 quads in 1 graphs
415 [main] - Took 0.002503 seconds to iterate over 9000 quads in 1 graphs
427 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 9000 quads in 9000 graphs
439 [main] - Took 0.012150 seconds to iterate over 9000 quads in 9000 graphs
439 [main] - Testing with 10000 quads
455 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 10000 quads in 1 graphs
458 [main] - Took 0.002670 seconds to iterate over 10000 quads in 1 graphs
468 [main] - DatasetGraph implementation
com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 10000 quads in 10000 graphs
482 [main] - Took 0.013079 seconds to iterate over 10000 quads in 10000 graphs
{noformat}
was (Author: andy.seaborne):
With the new test case, I get
{noformat}
458 [main] INFO dev.SlowDatasetGraph - Took 0.002670 seconds to iterate over
10000 quads in 1 graphs
482 [main] INFO dev.SlowDatasetGraph - Took 0.013079 seconds to iterate over
10000 quads in 10000 graphs
{noformat}
with the fix above. Seems reasonable for the general purpose dataset
implementation.
> In-memory DatasetGraph is extremely slow to iterate over when lots of named
> graphs are used
> -------------------------------------------------------------------------------------------
>
> Key: JENA-813
> URL: https://issues.apache.org/jira/browse/JENA-813
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 2.12.1
> Reporter: Rob Vesse
> Attachments: SlowDatasetGraph.java, SlowDatasetGraph.java
>
>
> Discovered this accidentally in the course of debugging a test case which
> seemed to running exceptionally slow. Basically the in-memory
> {{DatasetGraph}} implementation will perform extremely poorly when trying to
> iterate all quads it contains.
> This problem manifests at relatively small scales and shows a marked
> difference between small numbers of quads in a single graph versus the quads
> in individual graphs. In my testing with only a few thousand quads placing
> them in a single graph versus many individual graphs can see 5/6 orders of
> magnitude performance difference.
> I will attach code to reproduce and show my log output from running this on
> my system (Mac OS X Mavericks - Quad Core - 8GB RAM):
> {noformat}
> 0 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 1000 quads
> 57 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 90 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads
> in 1 graphs
> 123 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.021191
> seconds to iterate over 1000 quads in 1 graphs
> 123 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 144 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads
> in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.716750
> seconds to iterate over 1000 quads in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 2000 quads
> 861 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 885 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads
> in 1 graphs
> 890 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.005400
> seconds to iterate over 2000 quads in 1 graphs
> 890 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 922 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads
> in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 7.002919
> seconds to iterate over 2000 quads in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 3000 quads
> 7925 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 7944 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads
> in 1 graphs
> 7948 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took 0.003924
> seconds to iterate over 3000 quads in 1 graphs
> 7948 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 7973 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads
> in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 28.749688 seconds to iterate over 3000 quads in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing with
> 4000 quads
> 36723 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 36727 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads
> in 1 graphs
> 36730 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 0.000773 seconds to iterate over 4000 quads in 1 graphs
> 36730 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 36741 [main] INFO org.apache.jena.playground.SlowDatasetGraph - DatasetGraph
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads
> in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 74.896949 seconds to iterate over 4000 quads in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Testing
> with 5000 quads
> 111638 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 111644 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 5000 quads in 1 graphs
> 111645 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 0.000989 seconds to iterate over 5000 quads in 1 graphs
> 111645 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW - Lock : main
> 111655 [main] INFO org.apache.jena.playground.SlowDatasetGraph -
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has
> 5000 quads in 5000 graphs
> 285026 [main] INFO org.apache.jena.playground.SlowDatasetGraph - Took
> 173.370614 seconds to iterate over 5000 quads in 5000 graphs
> {noformat}
> From debugging my best guess is this is down to how
> {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by
> iterating over the graph names and then iterating over the individual graphs
> appending the iterators together. Something in the structuring means
> accessing and appending all those individual iterators is drastically slow
> compared to iterating a single graph.
> A related bug is that given a sufficiently large number of graphs attempts to
> iterate will fail with a {{StackOverflowError}}, the culprit in this case is
> {{IteratorCons}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)