This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new c1888cdf5361 [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in
`ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true
c1888cdf5361 is described below
commit c1888cdf53610909af996c7f41ee0cd7ee0691db
Author: yangjie01 <[email protected]>
AuthorDate: Mon Dec 25 15:42:13 2023 -0800
[SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in
`ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true
### What changes were proposed in this pull request?
This pr aims to change the test inputs in `ParquetTypeWideningSuite` to
valid int to fix failed test in `ParquetTypeWideningSuite` when
SPARK_ANSI_SQL_MODE` is set to true
### Why are the changes needed?
Fix the day test failure when `SPARK_ANSI_SQL_MODE` is set to true.
- https://github.com/apache/spark/actions/runs/7318074558/job/19934321639
- https://github.com/apache/spark/actions/runs/7305312703/job/19908735746
- https://github.com/apache/spark/actions/runs/7311683968/job/19921532402
```
[info] - unsupported parquet conversion IntegerType -> TimestampType ***
FAILED *** (68 milliseconds)
[info] org.apache.spark.SparkException: Job aborted due to stage failure:
Task 1 in stage 261.0 failed 1 times, most recent failure: Lost task 1.0 in
stage 261.0 (TID 523) (localhost executor driver):
org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value
'1.23' of the type "STRING" cannot be cast to "INT" because it is malformed.
Correct the value as per the syntax, or change its target type. Use `try_cast`
to tolerate malformed input and return NULL instead. I [...]
[info] == DataFrame ==
[info] "cast" was called from
[info]
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite.writeParquetFiles(ParquetTypeWideningSuite.scala:113)
[info]
[info] at
org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145)
[info] at
org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
[info] at
org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
[info] at
org.apache.spark.sql.catalyst.util.UTF8StringUtils.toIntExact(UTF8StringUtils.scala)
[info] at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
[info] at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] at
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
[info] at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:388)
[info] at
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:101)
[info] at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
[info] at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
[info] at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
[info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
[info] at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] at org.apache.spark.scheduler.Task.run(Task.scala:141)
[info] at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:628)
[info] at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:96)
[info] at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:631)
[info] at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] at java.base/java.lang.Thread.run(Thread.java:840)
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass GitHub Actions
- Manual check
```
SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite"
```
**Before**
```
[info] Run completed in 27 seconds, 432 milliseconds.
[info] Total number of tests run: 34
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 31, failed 3, canceled 0, ignored 0, pending 0
[info] *** 3 TESTS FAILED ***
[error] Failed tests:
[error]
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite
```
**After**
```
[info] Run completed in 28 seconds, 880 milliseconds.
[info] Total number of tests run: 31
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #44481 from LuciferYang/SPARK-40876-FOLLOWUP.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../execution/datasources/parquet/ParquetTypeWideningSuite.scala | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
index 72580f7078e2..0a8618944241 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala
@@ -166,9 +166,9 @@ class ParquetTypeWideningSuite
(Seq("1", "2", Int.MinValue.toString), LongType, IntegerType),
(Seq("1.23", "10.34"), DoubleType, FloatType),
(Seq("1.23", "10.34"), FloatType, LongType),
- (Seq("1.23", "10.34"), LongType, DateType),
- (Seq("1.23", "10.34"), IntegerType, TimestampType),
- (Seq("1.23", "10.34"), IntegerType, TimestampNTZType),
+ (Seq("1", "10"), LongType, DateType),
+ (Seq("1", "10"), IntegerType, TimestampType),
+ (Seq("1", "10"), IntegerType, TimestampNTZType),
(Seq("2020-01-01", "2020-01-02", "1312-02-27"), DateType, TimestampType)
)
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]