cfmcgrady commented on code in PR #2481:
URL: https://github.com/apache/datafusion-comet/pull/2481#discussion_r2399209075
##########
spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala:
##########
@@ -163,21 +175,26 @@ class CometArrayExpressionSuite extends CometTestBase
with AdaptiveSparkPlanHelp
withSQLConf(CometConf.COMET_EXPR_ALLOW_INCOMPATIBLE.key -> "true") {
Seq(true, false).foreach { dictionaryEnabled =>
withTempDir { dir =>
- val path = new Path(dir.toURI.toString, "test.parquet")
- makeParquetFileAllPrimitiveTypes(path, dictionaryEnabled =
dictionaryEnabled, 10000)
- spark.read.parquet(path.toString).createOrReplaceTempView("t1");
- checkSparkAnswerAndOperator(spark.sql("Select
array_prepend(array(_1),false) from t1"))
- checkSparkAnswerAndOperator(
- spark.sql("SELECT array_prepend(array(_2, _3, _4), 4) FROM t1"))
- checkSparkAnswerAndOperator(
- spark.sql("SELECT array_prepend(array(_2, _3, _4), null) FROM
t1"));
- checkSparkAnswerAndOperator(
- spark.sql("SELECT array_prepend(array(_6, _7), CAST(6.5 AS
DOUBLE)) FROM t1"));
- checkSparkAnswerAndOperator(
- spark.sql("SELECT array_prepend(array(_8), 'test') FROM t1"));
- checkSparkAnswerAndOperator(spark.sql("SELECT
array_prepend(array(_19), _19) FROM t1"));
- checkSparkAnswerAndOperator(
- spark.sql("SELECT array_prepend((CASE WHEN _2 =_3 THEN array(_4)
END), _4) FROM t1"));
+ withTempView("1") {
+ val path = new Path(dir.toURI.toString, "test.parquet")
+ makeParquetFileAllPrimitiveTypes(path, dictionaryEnabled =
dictionaryEnabled, 10000)
+ spark.read.parquet(path.toString).createOrReplaceTempView("t1");
Review Comment:
Sorry for the late reply - I was on holiday for Chinese National Day.
This happens because we need to isolate views between tests within the same
`Suite`, similar to what we do with SQLConf `withSQLConf(...)`.
1. A view created by the method `createOrReplaceTempView` has the same
lifetime as the Spark session that created the Dataset.
```
the comments of the `createOrReplaceTempView` method.
/**
* Creates a local temporary view using the given name. The lifetime of
this
* temporary view is tied to the [[SparkSession]] that was used to create
this Dataset.
*/
```
3. Tests within the same `Suite` share the same session.
You can verify this by following these steps:
1. Add the following code to the file `CometArrayExpressionSuite` on the
`main` branch:
```scala
test("show temp view") {
spark.sql("show tables").show(truncate = false)
spark.sql("select * from t1").show
}
```
2. Run the suite.
```
./mvnw test -DargLine="-XX:+IgnoreUnrecognizedVMOptions
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
-Djdk.reflect.useDirectMethodHandle=false
-Djava.library.path=/path/to/arrow-datafusion-comet/native/target/release"
-Dtest=none -Dsuites="org.apache.comet.CometArrayExpressionSuite"
```
output
```
Run starting. Expected test count is: 28
CometArrayExpressionSuite:
25/10/02 23:00:22 INFO core/src/lib.rs: Comet native library version 0.11.0
initialized
- array_remove - integer (4 seconds, 229 milliseconds)
- array_remove - test all types (native Parquet reader) (2 seconds, 871
milliseconds)
- array_remove - test all types (convert from Parquet) (4 seconds, 397
milliseconds)
- array_remove - fallback for unsupported type struct (196 milliseconds)
- array_append (2 seconds, 592 milliseconds)
- array_prepend (2 seconds, 318 milliseconds)
- ArrayInsert (1 second, 794 milliseconds)
- ArrayInsertUnsupportedArgs (267 milliseconds)
- array_contains - int values (395 milliseconds)
- array_contains - test all types (native Parquet reader) (5 seconds, 378
milliseconds)
- array_contains - array literals (1 second, 635 milliseconds)
- array_contains - test all types (convert from Parquet) (2 seconds, 886
milliseconds)
- array_distinct (1 second, 619 milliseconds)
- array_union (1 second, 639 milliseconds)
- array_max (2 seconds, 140 milliseconds)
- array_min (2 seconds, 92 milliseconds)
- array_intersect (1 second, 161 milliseconds)
- array_join (1 second, 218 milliseconds)
- arrays_overlap (1 second, 20 milliseconds)
- array_compact (1 second, 34 milliseconds)
- array_except - basic test (only integer values) (1 second, 204
milliseconds)
- array_except - test all types (native Parquet reader) (1 second, 762
milliseconds)
- array_except - test all types (convert from Parquet) (2 seconds, 682
milliseconds)
- array_repeat (1 second, 526 milliseconds)
- flatten - test all types (native Parquet reader) (1 second, 809
milliseconds)
- flatten - test all types (convert from Parquet) (2 seconds, 858
milliseconds)
- array literals (396 milliseconds)
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
| |t1 |true |
| |t2 |true |
| |t3 |true |
+---------+---------+-----------+
- temp view *** FAILED *** (139 milliseconds)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 1122.0 failed 1 times, most recent failure: Lost task 0.0 in stage
1122.0 (TID 3714) (172.31.25.39 executor driver):
org.apache.spark.SparkFileNotFoundException: File
file:/Users/fchen/Project/arrow-datafusion-comet/spark/target/tmp/spark-aa0feecf-5344-4e11-a30c-e9defa6e093d/test.parquet
does not exist
It is possible the underlying files have been updated. You can explicitly
invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in
SQL or by recreating the Dataset/DataFrame involved.
at
org.apache.spark.sql.errors.QueryExecutionErrors$.readCurrentFileNotFoundError(QueryExecutionErrors.scala:781)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:222)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:282)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]