zhztheplayer opened a new issue, #11509:
URL: https://github.com/apache/incubator-gluten/issues/11509
### Description
TreeMemoryConsumer is not thread-safe. This sometimes causes concurrent
errors when e.g., one thread is adding a child memory consumer and another
thread is spilling the consumer.
```
[Executor task launch worker for task 13.0 in stage 195.0 (TID 4182)] ERROR
org.apache.spark.util.Utils - Uncaught exception in thread Executor task launch
worker for task 13.0 in stage 195.0 (TID 4182)
java.lang.NullPointerException: Cannot invoke
"org.apache.spark.SparkEnv.blockManager()" because the return value of
"org.apache.spark.SparkEnv$.get()" is null
at org.apache.spark.scheduler.Task.$anonfun$run$3(Task.scala:146)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1375)
at org.apache.spark.scheduler.Task.run(Task.scala:144)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
[Executor task launch worker for task 14.0 in stage 195.0 (TID 4183)] ERROR
org.apache.spark.util.Utils - Aborting task
java.util.concurrent.ExecutionException:
org.apache.gluten.exception.GlutenException:
org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Error during calling Java code from native code:
java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
at java.base/java.util.HashMap$ValueIterator.next(HashMap.java:1625)
at
java.base/java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1055)
at java.base/java.util.AbstractQueue.addAll(AbstractQueue.java:186)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:63)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:68)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:50)
at
org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer.spill(TreeMemoryConsumer.java:116)
at
org.apache.spark.memory.TaskMemoryManager.trySpillAndAcquire(TaskMemoryManager.java:228)
at
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:191)
at
org.apache.spark.memory.MemoryConsumer.acquireMemory(MemoryConsumer.java:137)
at
org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer.borrow(TreeMemoryConsumer.java:66)
at
org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer$Node.borrow0(TreeMemoryConsumer.java:196)
at
org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer$Node.borrow(TreeMemoryConsumer.java:188)
at
org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer$Node.borrow0(TreeMemoryConsumer.java:196)
at
org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer$Node.borrow(TreeMemoryConsumer.java:188)
at
org.apache.gluten.memory.memtarget.RetryOnOomMemoryTarget.borrow(RetryOnOomMemoryTarget.java:39)
at
org.apache.gluten.memory.memtarget.OverAcquire.borrow(OverAcquire.java:59)
at
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:60)
at
org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native
Method)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:58)
at
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:38)
at
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:66)
at
org.apache.spark.sql.delta.stats.GlutenDeltaJobStatsTracker$VeloxTaskStatsAccumulator$$anon$2.call(GlutenDeltaJobStatsTracker.scala:275)
at
org.apache.spark.sql.delta.stats.GlutenDeltaJobStatsTracker$VeloxTaskStatsAccumulator$$anon$2.call(GlutenDeltaJobStatsTracker.scala:271)
at
java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
at --- Async.Stack.Trace --- (captured by IntelliJ IDEA debugger)
at java.base/java.util.concurrent.FutureTask.<init>(FutureTask.java:132)
at
java.base/java.util.concurrent.AbstractExecutorService.newTaskFor(AbstractExecutorService.java:113)
at
java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:144)
at
java.base/java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:753)
at
org.apache.spark.sql.delta.stats.GlutenDeltaJobStatsTracker$VeloxTaskStatsAccumulator.<init>(GlutenDeltaJobStatsTracker.scala:271)
at
org.apache.spark.sql.delta.stats.GlutenDeltaJobStatsTracker$GlutenDeltaTaskStatsTracker.$anonfun$newFile$1(GlutenDeltaJobStatsTracker.scala:116)
at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
at
org.apache.spark.sql.delta.stats.GlutenDeltaJobStatsTracker$GlutenDeltaTaskStatsTracker.newFile(GlutenDeltaJobStatsTracker.scala:116)
at
org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.$anonfun$renewCurrentWriter$6(FileFormatDataWriter.scala:300)
at
org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.$anonfun$renewCurrentWriter$6$adapted(FileFormatDataWriter.scala:300)
at scala.collection.immutable.List.foreach(List.scala:431)
at
org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:300)
at
org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$GlutenDynamicPartitionDataSingleWriter.beforeWrite(GlutenDeltaFileFormatWriter.scala:573)
at
org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$GlutenDynamicPartitionDataSingleWriter.write(GlutenDeltaFileFormatWriter.scala:596)
at
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
at
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
at
org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$.$anonfun$executeTask$1(GlutenDeltaFileFormatWriter.scala:486)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
at
org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$.executeTask(GlutenDeltaFileFormatWriter.scala:493)
at
org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$.$anonfun$executeWrite$4(GlutenDeltaFileFormatWriter.scala:321)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
```
### Gluten version
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]