[ 
https://issues.apache.org/jira/browse/JENA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse updated JENA-813:
---------------------------
    Attachment: SlowDatasetGraph.java

Upload corrected code which generates correct number of quads in the single 
graph case.  It also verifies that the correct number of quads are generated by 
counting the quads seen during the iteration.

Still seeing marked performance differences, will edit the bug description to 
include the updated figures

> In-memory DatasetGraph is extremely slow to iterate over when lots of named 
> graphs are used
> -------------------------------------------------------------------------------------------
>
>                 Key: JENA-813
>                 URL: https://issues.apache.org/jira/browse/JENA-813
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.12.1
>            Reporter: Rob Vesse
>         Attachments: SlowDatasetGraph.java, SlowDatasetGraph.java
>
>
> Discovered this accidentally in the course of debugging a test case which 
> seemed to running exceptionally slow.  Basically the in-memory 
> {{DatasetGraph}} implementation will perform extremely poorly when trying to 
> iterate all quads it contains.
> This problem manifests at relatively small scales and shows a marked 
> difference between small numbers of quads in a single graph versus the quads 
> in individual graphs.  In my testing with only a few thousand quads placing 
> them in a single graph versus many individual graphs can see 5/6 orders of 
> magnitude performance difference.
> I will attach code to reproduce and show my log output from running this on 
> my system (Mac OS X Mavericks - Quad Core - 8GB RAM):
> {noformat}
> 0 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 1000 quads
> 31 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 44 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads 
> in 1 graphs
> 64 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.009692 
> seconds to iterate over 1000 quads in 1 graphs
> 64 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 85 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads 
> in 1000 graphs
> 779 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.693548 
> seconds to iterate over 1000 quads in 1000 graphs
> 779 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 2000 quads
> 779 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 788 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads 
> in 1 graphs
> 788 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000047 
> seconds to iterate over 2000 quads in 1 graphs
> 788 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 821 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads 
> in 2000 graphs
> 7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 6.914044 
> seconds to iterate over 2000 quads in 2000 graphs
> 7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 3000 quads
> 7735 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads 
> in 1 graphs
> 7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000032 
> seconds to iterate over 3000 quads in 1 graphs
> 7743 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7767 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads 
> in 3000 graphs
> 36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 28.465885 seconds to iterate over 3000 quads in 3000 graphs
> 36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 
> 4000 quads
> 36233 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36234 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads 
> in 1 graphs
> 36235 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 0.000033 seconds to iterate over 4000 quads in 1 graphs
> 36235 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36253 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph 
> implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads 
> in 4000 graphs
> 109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 73.393265 seconds to iterate over 4000 quads in 4000 graphs
> 109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing 
> with 5000 quads
> 109646 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 5000 quads in 1 graphs
> 109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 0.000066 seconds to iterate over 5000 quads in 1 graphs
> 109648 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 109654 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 5000 quads in 5000 graphs
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 140.212082 seconds to iterate over 5000 quads in 5000 graphs
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing 
> with 6000 quads
> 249867 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 6000 quads in 1 graphs
> 249868 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 0.000049 seconds to iterate over 6000 quads in 1 graphs
> 249868 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 249870 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 6000 quads in 6000 graphs
> 525942 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 276.071471 seconds to iterate over 6000 quads in 6000 graphs
> 525942 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing 
> with 7000 quads
> 525943 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 525944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 7000 quads in 1 graphs
> 525944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 0.000023 seconds to iterate over 7000 quads in 1 graphs
> 525945 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 525950 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 7000 quads in 7000 graphs
> 954218 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 428.267270 seconds to iterate over 7000 quads in 7000 graphs
> 954218 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing 
> with 8000 quads
> 954218 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 954219 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 8000 quads in 1 graphs
> 954219 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 0.000010 seconds to iterate over 8000 quads in 1 graphs
> 954219 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 954222 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - 
> DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 
> 8000 quads in 8000 graphs
> 243342 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 
> 289.119560 seconds to iterate over 8000 quads in 8000 graphs
> {noformat}
> From debugging my best guess is this is down to how 
> {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by 
> iterating over the graph names and then iterating over the individual graphs 
> appending the iterators together.  Something in the structuring means 
> accessing and appending all those individual iterators is drastically slow 
> compared to iterating a single graph.
> A related bug is that given a sufficiently large number of graphs attempts to 
> iterate will fail with a {{StackOverflowError}}, the culprit in this case is 
> {{IteratorCons}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to