kecookier opened a new issue, #8268:
URL: https://github.com/apache/incubator-gluten/issues/8268
### Backend
VL (Velox)
### Bug description
Backtrace of Exception.
```
24/12/18 17:07:41 INFO Executor task launch worker for task 78584
ColumnarShuffleWriter: Gluten shuffle writer: Spilled 4250790592 / 8388608
bytes of data
24/12/18 17:07:41 INFO Executor task launch worker for task 78584
RetryOnOomMemoryTarget: Retrying spill require:1436968550 got:1431306240
24/12/18 17:07:41 INFO Executor task launch worker for task 78584
ColumnarShuffleWriter: Gluten shuffle writer: Trying to spill
9223372036854775807 bytes of data
24/12/18 17:07:41 ERROR Executor task launch worker for task 78584
ManagedReservationListener: Error unreserving memory from target
java.lang.IllegalStateException
at
org.apache.gluten.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at
org.apache.gluten.memory.memtarget.OverAcquire.repay(OverAcquire.java:77)
at
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.repay(ThrowOnOomMemoryTarget.java:124)
at
org.apache.gluten.memory.listener.ManagedReservationListener.unreserve(ManagedReservationListener.java:63)
at
org.apache.gluten.vectorized.ShuffleWriterJniWrapper.nativeEvict(Native Method)
at
org.apache.spark.shuffle.ColumnarShuffleWriter$$anon$1.spill(ColumnarShuffleWriter.scala:170)
at
org.apache.gluten.memory.memtarget.Spillers$AppendableSpillerList.spill(Spillers.java:86)
at
org.apache.gluten.memory.memtarget.Spillers$WithMinSpillSize.spill(Spillers.java:66)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:80)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:55)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:73)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:55)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:73)
at
org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:55)
at
org.apache.gluten.memory.memtarget.RetryOnOomMemoryTarget.retryingSpill(RetryOnOomMemoryTarget.java:60)
at
org.apache.gluten.memory.memtarget.RetryOnOomMemoryTarget.borrow(RetryOnOomMemoryTarget.java:40)
at
org.apache.gluten.memory.memtarget.OverAcquire.borrow(OverAcquire.java:63)
at
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:40)
at
org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49)
at org.apache.gluten.vectorized.ShuffleWriterJniWrapper.write(Native
Method)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:177)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:232)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:134)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:479)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1448)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:482)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```
After https://github.com/apache/incubator-gluten/pull/8132,
`overTarget.borrow(overSize)` may trigger a retrying spill. If so, it calls
`OverAcquire.repay()` during the spill procedure, which checks
`Preconditions.checkState(overTarget.usedBytes() == 0);`. However, currently, 0
< overTarget.usedBytes() < overSize. We can remove this precondition in
`repay()`, and only keep the precondition in `borrow()`.
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]