ijbgreen opened a new issue, #11716:
URL: https://github.com/apache/incubator-gluten/issues/11716
### Backend
VL (Velox)
### Bug description
When enabling columnar shuffle with the Velox backend using the following
configuration:
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.shuffle.enabled=true
Spark should execute the shuffle phase using Gluten’s columnar shuffle
implementation with the Velox backend. Queries such as loading a Parquet
dataset and running simple operations like count() or aggregations should
complete successfully.
Example workload:
```
val df = spark.read.parquet("parquet_file")
df.count()
```
or
`df.groupBy("tipo_comprobante").count().show()`
These operations are expected to run normally with Velox execution enabled.
Actual behavior
When columnar shuffle is enabled, Spark fails at runtime with an exception
originating from the Velox execution pipeline. The job fails while processing
the dataset and produces the following error:
```
org.apache.gluten.exception.GlutenException: VeloxRuntimeError
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan]
```
The root cause reported in the stack trace is:
`java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available`
The stack trace indicates the failure occurs during direct buffer allocation
through Netty:
io.netty.util.internal.PlatformDependent.directBuffer
org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read
If the columnar shuffle configuration is removed, the same workload executes
successfully using Velox for Parquet scans and the job completes without errors.
This issue description was written with the assistance of AI.
### Gluten version
Gluten-1.5, main branch
### Spark version
Spark-3.5.x
### Spark configurations
spark.plugins=org.apache.gluten.GlutenPlugin
spark.gluten.sql.columnar.backend=velox
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.shuffle.enabled=true
spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=4g
### System information
Gluten Version: 1.7.0-SNAPSHOT
Commit: 096545f03c4d8aa550902b13d2775a7ae2816599
CMake Version: 3.30.4
System: Linux-6.8.0-101-generic
Arch: x86_64
CPU Name: Model name: 12th Gen Intel(R)
Core(TM) i7-1255U
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 13.3.0
C Compiler: /usr/bin/cc
C Compiler Version: 13.3.0
CMake Prefix Path:
/usr/local;/usr;/;/server/spark/.local/share/uv/tools/cmake/lib/python3.12/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
### Relevant logs
```bash
Caused by: org.apache.gluten.exception.GlutenException: Exception:
VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan, plan node ID:
value-stream:0]
Caused by: org.apache.gluten.exception.GlutenException:
Error during calling Java code from native code:
java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available
at
io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:534)
at
org.apache.gluten.vectorized.LowCopyFileSegmentJniByteInputStream.read(LowCopyFileSegmentJniByteInputStream.java:100)
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeNext(Native
Method)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:70)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.next0(ColumnarBatchOutIterator.java:28)
at org.apache.gluten.iterator.ClosableIterator.next(ClosableIterator.java:48)
at
org.apache.gluten.vectorized.ColumnarBatchSerializerInstanceImpl$TaskDeserializationStream.readValue(ColumnarBatchSerializer.scala:187)
at
org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at
org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native
Method)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:65)
at
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
at
org.apache.gluten.execution.VeloxColumnarToRowExec.toRowIterator(VeloxColumnarToRowExec.scala:118)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:840)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]