[ 
https://issues.apache.org/jira/browse/JENA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse reassigned JENA-813:
------------------------------

    Assignee: Rob Vesse

> In-memory DatasetGraph is extremely slow to iterate over when lots of named 
> graphs are used
> -------------------------------------------------------------------------------------------
>
>                 Key: JENA-813
>                 URL: https://issues.apache.org/jira/browse/JENA-813
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.12.1
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>         Attachments: SlowDatasetGraph.java, SlowDatasetGraph.java
>
>
> Discovered this accidentally in the course of debugging a test case which 
> seemed to running exceptionally slow.  Basically the in-memory 
> {{DatasetGraph}} implementation will perform extremely poorly when trying to 
> iterate all quads it contains.
> This problem manifests at relatively small scales and shows a marked 
> difference between small numbers of quads in a single graph versus the quads 
> in individual graphs.  In my testing with only a few thousand quads placing 
> them in a single graph versus many individual graphs can see 5/6 orders of 
> magnitude performance difference.
> I will attach code to reproduce and show my log output from running this on 
> my system (Mac OS X Mavericks - Quad Core - 8GB RAM):
> {noformat}
> 0 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 1000 quads
> 57 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 90 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads 
> in 1 graphs
> 123 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.021191 
> seconds to iterate over 1000 quads in 1 graphs
> 123 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 144 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads 
> in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.716750 
> seconds to iterate over 1000 quads in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 2000 quads
> 861 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 885 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads 
> in 1 graphs
> 890 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.005400 
> seconds to iterate over 2000 quads in 1 graphs
> 890 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 922 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads 
> in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 7.002919 
> seconds to iterate over 2000 quads in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 3000 quads
> 7925 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads 
> in 1 graphs
> 7948 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.003924 
> seconds to iterate over 3000 quads in 1 graphs
> 7948 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7973 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads 
> in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 28.749688 seconds to iterate over 3000 quads in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 4000 quads
> 36723 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36727 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads 
> in 1 graphs
> 36730 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 0.000773 seconds to iterate over 4000 quads in 1 graphs
> 36730 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36741 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads 
> in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 74.896949 seconds to iterate over 4000 quads in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing 
> with 5000 quads
> 111638 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 111644 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 5000 quads in 1 graphs
> 111645 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 0.000989 seconds to iterate over 5000 quads in 1 graphs
> 111645 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 111655 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 5000 quads in 5000 graphs
> 285026 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 173.370614 seconds to iterate over 5000 quads in 5000 graphs
> {noformat}
> From debugging my best guess is this is down to how 
> {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by 
> iterating over the graph names and then iterating over the individual graphs 
> appending the iterators together.  Something in the structuring means 
> accessing and appending all those individual iterators is drastically slow 
> compared to iterating a single graph.
> A related bug is that given a sufficiently large number of graphs attempts to 
> iterate will fail with a {{StackOverflowError}}, the culprit in this case is 
> {{IteratorCons}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to