advancedxy opened a new issue, #102:
URL: https://github.com/apache/arrow-datafusion-comet/issues/102
### Describe the bug
When testing #100, I noticed that Comet's columnar shuffle doesn't handle
empty projection correctly. The shuffle write
thread throws an exception as follows:
```
Caused by: org.apache.comet.CometNativeException: Arrow error: External
error: Arrow error: Invalid argument error: must either specify a row count or
at least one column
at org.apache.comet.Native.executePlan(Native Method)
at
org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:65)
at
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:111)
at
org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:126)
at
org.apache.spark.sql.comet.execution.shuffle.CometShuffleWriteProcessor.write(CometShuffleExchangeExec.scala:290)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
```
### Steps to reproduce
In org.apache.comet.exec.CometExecSuite, modify the `empty projection` test
case as following:
```scala
test("empty projection") {
withParquetDataFrame((0 until 5).map(i => (i, i + 1))) { df =>
assert(df.where("_1 IS NOT NULL").count() == 5)
checkSparkAnswerAndOperator(df)
assert(df.select().limit(2).count() === 2)
}
}
```
### Expected behavior
Test case could passed correctly.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]