wangyum opened a new issue, #10168:
URL: https://github.com/apache/incubator-gluten/issues/10168

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   It throws `LZ4 decompress failed` in read shuffle stage:
   ```
   org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 
0]: Error during calling Java code from native code: 
org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: IOError: LZ4 decompress failed: 
ERROR_frameType_unknown
        at 
org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:53)
        at 
org.apache.gluten.vectorized.ColumnarBatchSerializerInstance$TaskDeserializationStream.liftedTree1$1(ColumnarBatchSerializer.scala:185)
        at 
org.apache.gluten.vectorized.ColumnarBatchSerializerInstance$TaskDeserializationStream.readValue(ColumnarBatchSerializer.scala:184)
        at 
org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at 
org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
        at 
org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native 
Method)
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
        at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at 
org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:154)
        at 
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:66)
        at 
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:38)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:95)
        at scala.collection.Iterator.isEmpty(Iterator.scala:387)
        at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.isEmpty(IteratorsV1.scala:85)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:121)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:77)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:899)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:899)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:147)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:623)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:626)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: org.apache.gluten.exception.GlutenException: IOError: LZ4 
decompress failed: ERROR_frameType_unknown
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNext(Native Method)
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:62)
        at 
org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:51)
        ... 44 more
   
   Retriable: False
   Function: operator()
   File: /work/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp
   Line: 571
   Stack trace:
   # 0  _ZN8facebook5velox7process10StackTraceC1Ei
   # 1  
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
   # 2  
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
   # 3  
_ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv.cold
   # 4  
_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
   # 5  
_ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEERPNS1_8OperatorERNS1_14BlockingReasonE
   # 6  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
   # 7  _ZN6gluten24WholeStageResultIterator4nextEv
   # 8  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
   # 9  0x00007f4b8407375b
   
        at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:41)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at 
org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:154)
        at 
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:66)
        at 
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:38)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:95)
        at scala.collection.Iterator.isEmpty(Iterator.scala:387)
        at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.isEmpty(IteratorsV1.scala:85)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:121)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:77)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:899)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:899)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:147)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:623)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:626)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: org.apache.gluten.exception.GlutenException: Exception: 
VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 
0]: Error during calling Java code from native code: 
org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: IOError: LZ4 decompress failed: 
ERROR_frameType_unknown
        at 
org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:53)
        at 
org.apache.gluten.vectorized.ColumnarBatchSerializerInstance$TaskDeserializationStream.liftedTree1$1(ColumnarBatchSerializer.scala:185)
        at 
org.apache.gluten.vectorized.ColumnarBatchSerializerInstance$TaskDeserializationStream.readValue(ColumnarBatchSerializer.scala:184)
        at 
org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
        at 
org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
        at 
org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native 
Method)
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
        at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at 
org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:154)
        at 
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:66)
        at 
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:38)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:95)
        at scala.collection.Iterator.isEmpty(Iterator.scala:387)
        at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.isEmpty(IteratorsV1.scala:85)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:121)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:77)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:899)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:899)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:147)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:623)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:626)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: org.apache.gluten.exception.GlutenException: IOError: LZ4 
decompress failed: ERROR_frameType_unknown
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNext(Native Method)
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:62)
        at 
org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:51)
        ... 44 more
   
   Retriable: False
   Function: operator()
   File: /work/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp
   Line: 571
   Stack trace:
   # 0  _ZN8facebook5velox7process10StackTraceC1Ei
   # 1  
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
   # 2  
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
   # 3  
_ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv.cold
   # 4  
_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
   # 5  
_ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEERPNS1_8OperatorERNS1_14BlockingReasonE
   # 6  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
   # 7  _ZN6gluten24WholeStageResultIterator4nextEv
   # 8  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
   # 9  0x00007f4b8407375b
   
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native 
Method)
        at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
        at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
        ... 29 more
   ```
   
   <img width="3268" height="3406" alt="Image" 
src="https://github.com/user-attachments/assets/a7289611-50cf-4297-a93e-0c2192d68ddc";
 />
   
   ### Gluten version
   
   main branch
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to