Redriver created TINKERPOP-2831:
-----------------------------------

             Summary: Throw NoSuchElementException frequently which slowers the 
performance
                 Key: TINKERPOP-2831
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2831
             Project: TinkerPop
          Issue Type: Improvement
          Components: process
    Affects Versions: 3.5.4
            Reporter: Redriver
         Attachments: 
profile_hdc49-mcc10-01-0210-5103-024-tess0131.stratus.rno.ebay.com.svg

When I run g.V().label().groupCount() on a huge graph: 600m vertices + 6 
billion edges, the JVM async profiler exposed that the NoSuchElementException 
is a hotspot. In fact, that exception is used to inform the caller that the 
iteration end reached, so the stack trace information is not used. In addition, 
creating a new exception everytime is also not necessary.
{code:java}
java.lang.Throwable.fillInStackTrace(Native Method)
java.lang.Throwable.fillInStackTrace(Throwable.java:783) => holding 
Monitor(java.util.NoSuchElementException@1860956919})
java.lang.Throwable.<init>(Throwable.java:250)
java.lang.Exception.<init>(Exception.java:54)
java.lang.RuntimeException.<init>(RuntimeException.java:51)
java.util.NoSuchElementException.<init>(NoSuchElementException.java:46)
org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraphIterator.next(TinkerGraphIterator.java:63)
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.getOrCreateVertex(JanusGraphVertexDeserializer.java:192)
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:153)
org.janusgraph.hadoop.formats.util.HadoopRecordReader.nextKeyValue(HadoopRecordReader.java:69)
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:220)
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:348)
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
org.apache.spark.scheduler.Task.run(Task.scala:121)
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:416)
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:422)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to