[jira] [Commented] (TINKERPOP-2831) Throw NoSuchElementException frequently which slowers the performance

ASF GitHub Bot (Jira) Fri, 09 Dec 2022 16:20:35 -0800


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645526#comment-17645526
 ]


ASF GitHub Bot commented on TINKERPOP-2831:
-------------------------------------------

cole-bq commented on PR #1873:
URL: https://github.com/apache/tinkerpop/pull/1873#issuecomment-1344935225

   @ministat I've done a deeper exploration into this issue and I don't believe 
that any modification to tinkerpop is necessary. Based on my initial glance at 
the code I was under the impression that `TinkergraphIterator.hasNext()` would 
be a slow operation due to its use of exceptions to determine if there were any 
elements remaining. In my testing, the TinkergraphIterator returned by 
`tg.vertices(vertexId)` always uses FastNoSuchElementException internally 
during calls to `hasNext()`.
   
   Based on my findings I would suggest modifying your example to:
   ```
   public TinkerVertex getOrCreateVertex(final long vertexId, final String 
label, final TinkerGraph tg) {
       TinkerVertex v;
       Iterator<Vertex> iter = tg.vertices(vertexId);
       if(iter.hasNext()) {
           try {
               v = (TinkerVertex) iter.next();
           } catch (NoSuchElementException e) {}
       } else {
           if (null != label) {
                   v = (TinkerVertex) tg.addVertex(T.label, label, T.id, 
vertexId);
               } else {
                   v = (TinkerVertex) tg.addVertex(T.id, vertexId);
               }
       }
       return v;
   }
   ```
   Based on my testing this will give the same or slightly better performance 
than switching `next()` to use `FastNoSuchElementException` and not guarding 
calls to next() with hasNext().
   
   I would recommend that we continue using NoSuchElementException as is. Even 
if the stack trace for it isn't currently being used, it seems to me that it 
may be useful at some point in the future. If `hasNext()` can be used to avoid 
any performance issues I don't see a benefit in removing the stack trace.




> Throw NoSuchElementException frequently which slowers the performance
> ---------------------------------------------------------------------
>
>                 Key: TINKERPOP-2831
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2831
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.5.4
>            Reporter: Redriver
>            Priority: Major
>         Attachments: Screen Shot 2022-11-24 at 11.35.40.png
>
>
> When I run g.V().label().groupCount() on a huge graph: 600m vertices + 6 
> billion edges, the JVM async profiler exposed that the NoSuchElementException 
> is a hotspot. In fact, that exception is used to inform the caller that the 
> iteration end reached, so the stack trace information is not used. In 
> addition, creating a new exception everytime is also not necessary.
> {code:java}
> java.lang.Throwable.fillInStackTrace(Native Method)
> java.lang.Throwable.fillInStackTrace(Throwable.java:783) => holding 
> Monitor(java.util.NoSuchElementException@1860956919})
> java.lang.Throwable.<init>(Throwable.java:250)
> java.lang.Exception.<init>(Exception.java:54)
> java.lang.RuntimeException.<init>(RuntimeException.java:51)
> java.util.NoSuchElementException.<init>(NoSuchElementException.java:46)
> org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraphIterator.next(TinkerGraphIterator.java:63)
> org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.getOrCreateVertex(JanusGraphVertexDeserializer.java:192)
> org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:153)
> org.janusgraph.hadoop.formats.util.HadoopRecordReader.nextKeyValue(HadoopRecordReader.java:69)
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:220)
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:348)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
> org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> org.apache.spark.scheduler.Task.run(Task.scala:121)
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:416)
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:422)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TINKERPOP-2831) Throw NoSuchElementException frequently which slowers the performance

Reply via email to