[
https://issues.apache.org/jira/browse/TINKERPOP-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645526#comment-17645526
]
ASF GitHub Bot commented on TINKERPOP-2831:
-------------------------------------------
cole-bq commented on PR #1873:
URL: https://github.com/apache/tinkerpop/pull/1873#issuecomment-1344935225
@ministat I've done a deeper exploration into this issue and I don't believe
that any modification to tinkerpop is necessary. Based on my initial glance at
the code I was under the impression that `TinkergraphIterator.hasNext()` would
be a slow operation due to its use of exceptions to determine if there were any
elements remaining. In my testing, the TinkergraphIterator returned by
`tg.vertices(vertexId)` always uses FastNoSuchElementException internally
during calls to `hasNext()`.
Based on my findings I would suggest modifying your example to:
```
public TinkerVertex getOrCreateVertex(final long vertexId, final String
label, final TinkerGraph tg) {
TinkerVertex v;
Iterator<Vertex> iter = tg.vertices(vertexId);
if(iter.hasNext()) {
try {
v = (TinkerVertex) iter.next();
} catch (NoSuchElementException e) {}
} else {
if (null != label) {
v = (TinkerVertex) tg.addVertex(T.label, label, T.id,
vertexId);
} else {
v = (TinkerVertex) tg.addVertex(T.id, vertexId);
}
}
return v;
}
```
Based on my testing this will give the same or slightly better performance
than switching `next()` to use `FastNoSuchElementException` and not guarding
calls to next() with hasNext().
I would recommend that we continue using NoSuchElementException as is. Even
if the stack trace for it isn't currently being used, it seems to me that it
may be useful at some point in the future. If `hasNext()` can be used to avoid
any performance issues I don't see a benefit in removing the stack trace.
> Throw NoSuchElementException frequently which slowers the performance
> ---------------------------------------------------------------------
>
> Key: TINKERPOP-2831
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2831
> Project: TinkerPop
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.5.4
> Reporter: Redriver
> Priority: Major
> Attachments: Screen Shot 2022-11-24 at 11.35.40.png
>
>
> When I run g.V().label().groupCount() on a huge graph: 600m vertices + 6
> billion edges, the JVM async profiler exposed that the NoSuchElementException
> is a hotspot. In fact, that exception is used to inform the caller that the
> iteration end reached, so the stack trace information is not used. In
> addition, creating a new exception everytime is also not necessary.
> {code:java}
> java.lang.Throwable.fillInStackTrace(Native Method)
> java.lang.Throwable.fillInStackTrace(Throwable.java:783) => holding
> Monitor(java.util.NoSuchElementException@1860956919})
> java.lang.Throwable.<init>(Throwable.java:250)
> java.lang.Exception.<init>(Exception.java:54)
> java.lang.RuntimeException.<init>(RuntimeException.java:51)
> java.util.NoSuchElementException.<init>(NoSuchElementException.java:46)
> org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraphIterator.next(TinkerGraphIterator.java:63)
> org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.getOrCreateVertex(JanusGraphVertexDeserializer.java:192)
> org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:153)
> org.janusgraph.hadoop.formats.util.HadoopRecordReader.nextKeyValue(HadoopRecordReader.java:69)
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:220)
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:348)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
> org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> org.apache.spark.scheduler.Task.run(Task.scala:121)
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:416)
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:422)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)