ijbgreen opened a new issue, #11716:
URL: https://github.com/apache/incubator-gluten/issues/11716

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   When enabling columnar shuffle with the Velox backend using the following 
configuration:
   
   spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
   spark.gluten.sql.columnar.shuffle.enabled=true
   
   Spark should execute the shuffle phase using Gluten’s columnar shuffle 
implementation with the Velox backend. Queries such as loading a Parquet 
dataset and running simple operations like count() or aggregations should 
complete successfully.
   
   Example workload:
   ```
   
   val df = spark.read.parquet("parquet_file")
   df.count()
   ```
   
   or
   
   `df.groupBy("tipo_comprobante").count().show()`
   
   These operations are expected to run normally with Velox execution enabled.
   
   Actual behavior
   
   When columnar shuffle is enabled, Spark fails at runtime with an exception 
originating from the Velox execution pipeline. The job fails while processing 
the dataset and produces the following error:
   
   ```
   org.apache.gluten.exception.GlutenException: VeloxRuntimeError
   Error Code: INVALID_STATE
   Reason: Operator::getOutput failed for [operator: TableScan]
   ```
   
   The root cause reported in the stack trace is:
   
   `java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.<init>(long, int) not available`
   
   The stack trace indicates the failure occurs during direct buffer allocation 
through Netty:
   
   io.netty.util.internal.PlatformDependent.directBuffer
   org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read
   
   If the columnar shuffle configuration is removed, the same workload executes 
successfully using Velox for Parquet scans and the job completes without errors.
   
   This issue description was written with the assistance of AI.
   
   ### Gluten version
   
   Gluten-1.5, main branch
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   spark.plugins=org.apache.gluten.GlutenPlugin
   spark.gluten.sql.columnar.backend=velox
   spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
   spark.gluten.sql.columnar.shuffle.enabled=true
   spark.memory.offHeap.enabled=true
   spark.memory.offHeap.size=4g
   
   ### System information
   
   Gluten Version: 1.7.0-SNAPSHOT
   Commit: 096545f03c4d8aa550902b13d2775a7ae2816599
   CMake Version: 3.30.4
   System: Linux-6.8.0-101-generic
   Arch: x86_64
   CPU Name: Model name:                              12th Gen Intel(R) 
Core(TM) i7-1255U
   C++ Compiler: /usr/bin/c++
   C++ Compiler Version: 13.3.0
   C Compiler: /usr/bin/cc
   C Compiler Version: 13.3.0
   CMake Prefix Path: 
/usr/local;/usr;/;/server/spark/.local/share/uv/tools/cmake/lib/python3.12/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
   
   ### Relevant logs
   
   ```bash
   Caused by: org.apache.gluten.exception.GlutenException: Exception: 
VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: Operator::getOutput failed for [operator: TableScan, plan node ID: 
value-stream:0]
   
   Caused by: org.apache.gluten.exception.GlutenException:
   Error during calling Java code from native code:
   java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.<init>(long, int) not available
   
   at 
io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:534)
   at 
org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read(LowCopyFileSegmentJniByteInputStream.java:100)
   at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNext(Native 
Method)
   at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:70)
   at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:28)
   at org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:48)
   at 
org.apache.gluten.vectorized.ColumnarBatchSerializerInstanceImpl$TaskDeserializationStream.readValue(ColumnarBatchSerializer.scala:187)
   at 
org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
   at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
   at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
   at 
org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
   at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native 
Method)
   at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:65)
   at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
   at 
org.apache.gluten.execution.VeloxColumnarToRowExec.toRowIterator(VeloxColumnarToRowExec.scala:118)
   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   at java.lang.Thread.run(Thread.java:840)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to