vidyasankarv opened a new issue, #440:
URL: https://github.com/apache/datafusion-comet/issues/440

   ### Describe the bug
   
   When a String which is an invalid date is cast to a Datetype 
   
   in spark 3.2 the error message is
   
   : java.time.DateTimeException: Cannot cast 0 to DateType.
   
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 
2) (192.168.1.10 executor driver): java.time.DateTimeException: Cannot cast 0 
to DateType.
   ```
   
   in spark 3.3 and above the error message is : 
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 
2) (192.168.1.10 executor driver): org.apache.spark.SparkDateTimeException: 
[CAST_INVALID_INPUT] The value '0' of the type "STRING" cannot be cast to 
"DATE" because it is malformed. Correct the value as per the syntax, or change 
its target type. Use `try_cast` to tolerate malformed input and return NULL 
instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this 
error.
   ```
   
   Currently in Comet the error messages match to spark 3.3 and above 
   
   ### Steps to reproduce
   
   
   Currently in the CometTestSuite we have added an assumption for this test to 
be only running in  Spark3.3 and above.
   Removing that triggers a test failure when the test suite is run on with the 
following env **jdk-1.8 and spark-3.2.0**
   
   
   Additionally you can reproduce this error locally using spark shell
   
   `$SPARK_HOME/bin/spark-shell --conf spark.sql.ansi.enabled=true`
   
   ```
   import org.apache.spark.sql._  
   import org.apache.spark.sql.types._  
     
   import java.io.File  
   import java.nio.file.Files  
   
   
     def roundtripParquet(df: DataFrame): DataFrame = {  
       val tempDir = Files.createTempDirectory("spark").toString  
       val filename = new File(tempDir, 
s"castTest_${System.currentTimeMillis()}.parquet").toString  
       df.write.mode(SaveMode.Overwrite).parquet(filename)  
       spark.read.parquet(filename)  
     }  
     
     import spark.implicits._  
     
     val data = roundtripParquet(Seq("0").toDF("a"))  
     data.createOrReplaceTempView("t")  
     val df = spark.sql(s"select a, cast(a as ${DataTypes.DateType.sql}) from t 
order by a")  
     df.collect().foreach(println) 
   ```
   
   
   
   ### Expected behavior
   
   CometTestSuite `cast String to DateType' should pass for all environments
   
   
   ### Additional context
   
   https://github.com/apache/datafusion-comet/pull/383#issuecomment-2115341055


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to