Re: [PR] feat: Support reverse function with ArrayType input [datafusion-comet]

via GitHub Thu, 02 Oct 2025 08:31:48 -0700


cfmcgrady commented on code in PR #2481:
URL: https://github.com/apache/datafusion-comet/pull/2481#discussion_r2399209075



##########
spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala:
##########
@@ -163,21 +175,26 @@ class CometArrayExpressionSuite extends CometTestBase 
with AdaptiveSparkPlanHelp
     withSQLConf(CometConf.COMET_EXPR_ALLOW_INCOMPATIBLE.key -> "true") {
       Seq(true, false).foreach { dictionaryEnabled =>
         withTempDir { dir =>
-          val path = new Path(dir.toURI.toString, "test.parquet")
-          makeParquetFileAllPrimitiveTypes(path, dictionaryEnabled = 
dictionaryEnabled, 10000)
-          spark.read.parquet(path.toString).createOrReplaceTempView("t1");
-          checkSparkAnswerAndOperator(spark.sql("Select 
array_prepend(array(_1),false) from t1"))
-          checkSparkAnswerAndOperator(
-            spark.sql("SELECT array_prepend(array(_2, _3, _4), 4) FROM t1"))
-          checkSparkAnswerAndOperator(
-            spark.sql("SELECT array_prepend(array(_2, _3, _4), null) FROM 
t1"));
-          checkSparkAnswerAndOperator(
-            spark.sql("SELECT array_prepend(array(_6, _7), CAST(6.5 AS 
DOUBLE)) FROM t1"));
-          checkSparkAnswerAndOperator(
-            spark.sql("SELECT array_prepend(array(_8), 'test') FROM t1"));
-          checkSparkAnswerAndOperator(spark.sql("SELECT 
array_prepend(array(_19), _19) FROM t1"));
-          checkSparkAnswerAndOperator(
-            spark.sql("SELECT array_prepend((CASE WHEN _2 =_3 THEN array(_4) 
END), _4) FROM t1"));
+          withTempView("1") {
+            val path = new Path(dir.toURI.toString, "test.parquet")
+            makeParquetFileAllPrimitiveTypes(path, dictionaryEnabled = 
dictionaryEnabled, 10000)
+            spark.read.parquet(path.toString).createOrReplaceTempView("t1");

Review Comment:
   Sorry for the late reply - I was on holiday for Chinese National Day.
   
   This happens because we need to isolate views between tests within the same 
`Suite`, similar to what we do with SQLConf `withSQLConf(...)`.
   
   1. A view created by the method `createOrReplaceTempView` has the same 
lifetime as the Spark session that created the Dataset. 
   ```
   the comments of the `createOrReplaceTempView` method.
     /**
      * Creates a local temporary view using the given name. The lifetime of 
this
      * temporary view is tied to the [[SparkSession]] that was used to create 
this Dataset.
      */
   ```
   
   3. Tests within the same `Suite` share the same session.
   
   You can verify this by following these steps:
   
   1. Add the following code to the file `CometArrayExpressionSuite` on the 
`main` branch:
       ```scala
       test("show temp view") {
         spark.sql("show tables").show(truncate = false)
         spark.sql("select * from t1").show
       }
       ```
   2. Run the suite.
   ```
   ./mvnw test -DargLine="-XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
-Djdk.reflect.useDirectMethodHandle=false 
-Djava.library.path=/path/to/arrow-datafusion-comet/native/target/release" 
-Dtest=none -Dsuites="org.apache.comet.CometArrayExpressionSuite"
   ```
   
   output
   ```
   Run starting. Expected test count is: 28
   CometArrayExpressionSuite:
   25/10/02 23:00:22 INFO core/src/lib.rs: Comet native library version 0.11.0 
initialized
   - array_remove - integer (4 seconds, 229 milliseconds)
   - array_remove - test all types (native Parquet reader) (2 seconds, 871 
milliseconds)
   - array_remove - test all types (convert from Parquet) (4 seconds, 397 
milliseconds)
   - array_remove - fallback for unsupported type struct (196 milliseconds)
   - array_append (2 seconds, 592 milliseconds)
   - array_prepend (2 seconds, 318 milliseconds)
   - ArrayInsert (1 second, 794 milliseconds)
   - ArrayInsertUnsupportedArgs (267 milliseconds)
   - array_contains - int values (395 milliseconds)
   - array_contains - test all types (native Parquet reader) (5 seconds, 378 
milliseconds)
   - array_contains - array literals (1 second, 635 milliseconds)
   - array_contains - test all types (convert from Parquet) (2 seconds, 886 
milliseconds)
   - array_distinct (1 second, 619 milliseconds)
   - array_union (1 second, 639 milliseconds)
   - array_max (2 seconds, 140 milliseconds)
   - array_min (2 seconds, 92 milliseconds)
   - array_intersect (1 second, 161 milliseconds)
   - array_join (1 second, 218 milliseconds)
   - arrays_overlap (1 second, 20 milliseconds)
   - array_compact (1 second, 34 milliseconds)
   - array_except - basic test (only integer values) (1 second, 204 
milliseconds)
   - array_except - test all types (native Parquet reader) (1 second, 762 
milliseconds)
   - array_except - test all types (convert from Parquet) (2 seconds, 682 
milliseconds)
   - array_repeat (1 second, 526 milliseconds)
   - flatten - test all types (native Parquet reader) (1 second, 809 
milliseconds)
   - flatten - test all types (convert from Parquet) (2 seconds, 858 
milliseconds)
   - array literals (396 milliseconds)
   +---------+---------+-----------+
   |namespace|tableName|isTemporary|
   +---------+---------+-----------+
   |         |t1       |true       |
   |         |t2       |true       |
   |         |t3       |true       |
   +---------+---------+-----------+
   
   - temp view *** FAILED *** (139 milliseconds)
     org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 1122.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
1122.0 (TID 3714) (172.31.25.39 executor driver): 
org.apache.spark.SparkFileNotFoundException: File 
file:/Users/fchen/Project/arrow-datafusion-comet/spark/target/tmp/spark-aa0feecf-5344-4e11-a30c-e9defa6e093d/test.parquet
 does not exist
   It is possible the underlying files have been updated. You can explicitly 
invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in 
SQL or by recreating the Dataset/DataFrame involved.
           at 
org.apache.spark.sql.errors.QueryExecutionErrors$.readCurrentFileNotFoundError(QueryExecutionErrors.scala:781)
           at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:222)
           at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:282)
           at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Support reverse function with ArrayType input [datafusion-comet]

Reply via email to