Re: [PR] fix: coalesce should return correct datatype [arrow-datafusion-comet]

via GitHub Tue, 05 Mar 2024 08:53:38 -0800


viirya commented on code in PR #168:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/168#discussion_r1513169860



##########
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##########
@@ -34,6 +34,19 @@ import 
org.apache.comet.CometSparkSessionExtensions.{isSpark32, isSpark33Plus, i
 class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
   import testImplicits._
 
+  test("coalesce should return correct datatype") {
+    Seq(true, false).foreach { dictionaryEnabled =>
+      withTempDir { dir =>
+        val path = new Path(dir.toURI.toString, "test.parquet")
+        makeParquetFileAllTypes(path, dictionaryEnabled = dictionaryEnabled, 
10000)
+        withParquetTable(path.toString, "tbl") {
+          checkSparkAnswerAndOperator(
+            "SELECT coalesce(cast(_18 as date), cast(_19 as date), _20) FROM 
tbl")
+        }
+      }
+    }
+  }

Review Comment:
   Due to the issue https://github.com/apache/arrow-datafusion/issues/9458, the 
return type and the actual output array is different in DataFusion `coalesce` 
function:
   
   ```
     org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
(TID 2) (192.168.86.44 executor driver): org.apache.comet.CometNativeException
   : Arrow error: Invalid argument error: column types must match schema types, 
expected Utf8 but found Date32 at column index 0                                
                                                                          
           at org.apache.comet.Native.executePlan(Native Method)                
                                      
           at 
org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:65)
           at 
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:111)
           at 
org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:126)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
           at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: coalesce should return correct datatype [arrow-datafusion-comet]

Reply via email to