(spark) branch master updated: [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true

dongjoon Mon, 25 Dec 2023 15:42:31 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new c1888cdf5361 [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in 
`ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true
c1888cdf5361 is described below

commit c1888cdf53610909af996c7f41ee0cd7ee0691db
Author: yangjie01 <[email protected]>
AuthorDate: Mon Dec 25 15:42:13 2023 -0800

    [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in 
`ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true
    
    ### What changes were proposed in this pull request?
    This pr aims to change the test inputs in `ParquetTypeWideningSuite` to 
valid int to fix failed test in `ParquetTypeWideningSuite` when 
SPARK_ANSI_SQL_MODE` is set to true
    
    ### Why are the changes needed?
    Fix the day test failure when `SPARK_ANSI_SQL_MODE` is set to true.
    
    - https://github.com/apache/spark/actions/runs/7318074558/job/19934321639
    - https://github.com/apache/spark/actions/runs/7305312703/job/19908735746
    - https://github.com/apache/spark/actions/runs/7311683968/job/19921532402
    
    ```
    [info] - unsupported parquet conversion IntegerType -> TimestampType *** 
FAILED *** (68 milliseconds)
    [info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 261.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 261.0 (TID 523) (localhost executor driver): 
org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value 
'1.23' of the type "STRING" cannot be cast to "INT" because it is malformed. 
Correct the value as per the syntax, or change its target type. Use `try_cast` 
to tolerate malformed input and return NULL instead. I [...]
    [info] == DataFrame ==
    [info] "cast" was called from
    [info] 
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite.writeParquetFiles(ParquetTypeWideningSuite.scala:113)
    [info]
    [info]  at 
org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145)
    [info]  at 
org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
    [info]  at 
org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
    [info]  at 
org.apache.spark.sql.catalyst.util.UTF8StringUtils.toIntExact(UTF8StringUtils.scala)
    [info]  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
    [info]  at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    [info]  at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
    [info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:388)
    [info]  at 
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:101)
    [info]  at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
    [info]  at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
    [info]  at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    [info]  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    [info]  at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    [info]  at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    [info]  at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
    [info]  at org.apache.spark.scheduler.Task.run(Task.scala:141)
    [info]  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:628)
    [info]  at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    [info]  at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    [info]  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:96)
    [info]  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:631)
    [info]  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    [info]  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    [info]  at java.base/java.lang.Thread.run(Thread.java:840)
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    - Pass GitHub Actions
    - Manual check
    ```
    SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly 
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite"
    ```
    **Before**
    ```
    [info] Run completed in 27 seconds, 432 milliseconds.
    [info] Total number of tests run: 34
    [info] Suites: completed 1, aborted 0
    [info] Tests: succeeded 31, failed 3, canceled 0, ignored 0, pending 0
    [info] *** 3 TESTS FAILED ***
    [error] Failed tests:
    [error]         
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite
    
    ```
    
    **After**
    ```
    [info] Run completed in 28 seconds, 880 milliseconds.
    [info] Total number of tests run: 31
    [info] Suites: completed 1, aborted 0
    [info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0
    [info] All tests passed.
    ```
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #44481 from LuciferYang/SPARK-40876-FOLLOWUP.
    
    Authored-by: yangjie01 <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../execution/datasources/parquet/ParquetTypeWideningSuite.scala    | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
index 72580f7078e2..0a8618944241 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
@@ -166,9 +166,9 @@ class ParquetTypeWideningSuite
       (Seq("1", "2", Int.MinValue.toString), LongType, IntegerType),
       (Seq("1.23", "10.34"), DoubleType, FloatType),
       (Seq("1.23", "10.34"), FloatType, LongType),
-      (Seq("1.23", "10.34"), LongType, DateType),
-      (Seq("1.23", "10.34"), IntegerType, TimestampType),
-      (Seq("1.23", "10.34"), IntegerType, TimestampNTZType),
+      (Seq("1", "10"), LongType, DateType),
+      (Seq("1", "10"), IntegerType, TimestampType),
+      (Seq("1", "10"), IntegerType, TimestampNTZType),
       (Seq("2020-01-01", "2020-01-02", "1312-02-27"), DateType, TimestampType)
     )
   }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true

Reply via email to