advancedxy opened a new issue, #102:
URL: https://github.com/apache/arrow-datafusion-comet/issues/102

   ### Describe the bug
   
   When testing #100, I noticed that Comet's columnar shuffle doesn't handle 
empty projection correctly. The shuffle write
   thread throws an exception as follows:
   
   ```
   Caused by: org.apache.comet.CometNativeException: Arrow error: External 
error: Arrow error: Invalid argument error: must either specify a row count or 
at least one column
        at org.apache.comet.Native.executePlan(Native Method)
        at 
org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:65)
        at 
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:111)
        at 
org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:126)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleWriteProcessor.write(CometShuffleExchangeExec.scala:290)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   ### Steps to reproduce
   
   In org.apache.comet.exec.CometExecSuite, modify the `empty projection` test 
case as following:
   ```scala
     test("empty projection") {
       withParquetDataFrame((0 until 5).map(i => (i, i + 1))) { df =>
         assert(df.where("_1 IS NOT NULL").count() == 5)
         checkSparkAnswerAndOperator(df)
         assert(df.select().limit(2).count() === 2)
       }
     }
   ```
   
   ### Expected behavior
   
   Test case could passed correctly.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to