[ 
https://issues.apache.org/jira/browse/SPARK-30688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh updated SPARK-30688:
-----------------------------------
    Description: 
 
{code:java}
scala> spark.sql("select unix_timestamp('20201', 'yyyyww')").show();
+-----------------------------+
|unix_timestamp(20201, yyyyww)|
+-----------------------------+
|                         null|
+-----------------------------+
 
scala> spark.sql("select unix_timestamp('20202', 'yyyyww')").show();
-----------------------------+
|unix_timestamp(20202, yyyyww)|
+-----------------------------+
|                   1578182400|
+-----------------------------+
 
{code}
 

 

This seems to happen for leap year only, I dig deeper into it and it seems that 
 Spark is using the java.text.SimpleDateFormat and try to parse the expression 
here

[org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652]
{code:java}
formatter.parse(
 t.asInstanceOf[UTF8String].toString).getTime / 1000L{code}
 but fail and SimpleDateFormat unable to parse the date throw Unparseable 
Exception but Spark handle it silently and returns NULL.

 

*Spark-3.0:* I did some tests where spark no longer using the legacy 
java.text.SimpleDateFormat but java date/time API, it seems  date/time API 
expect a valid date with valid format

 org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse

  was:
 
{code:java}
scala> spark.sql("select unix_timestamp('20201', 'yyyyww')").show();
+-----------------------------+
|unix_timestamp(20201, yyyyww)|
+-----------------------------+
|                         null|
+-----------------------------+
 
scala> spark.sql("select unix_timestamp('20202', 'yyyyww')").show();
-----------------------------+
|unix_timestamp(20202, yyyyww)|
+-----------------------------+
|                   1578182400|
+-----------------------------+
 
{code}
 

 

This seems to happen for leap year only, I dig deeper into it and it seems that 
 Spark is using the java.text.SimpleDateFormat and try to parse the expression 
here

[org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652]
{code:java}
formatter.parse(
 t.asInstanceOf[UTF8String].toString).getTime / 1000L{code}
 but fail and SimpleDateFormat unable to parse the date throw Unparseable 
Exception but Spark handle it silently and returns NULL.

 


> Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF
> --------------------------------------------------------------------------
>
>                 Key: SPARK-30688
>                 URL: https://issues.apache.org/jira/browse/SPARK-30688
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Rajkumar Singh
>            Priority: Major
>
>  
> {code:java}
> scala> spark.sql("select unix_timestamp('20201', 'yyyyww')").show();
> +-----------------------------+
> |unix_timestamp(20201, yyyyww)|
> +-----------------------------+
> |                         null|
> +-----------------------------+
>  
> scala> spark.sql("select unix_timestamp('20202', 'yyyyww')").show();
> -----------------------------+
> |unix_timestamp(20202, yyyyww)|
> +-----------------------------+
> |                   1578182400|
> +-----------------------------+
>  
> {code}
>  
>  
> This seems to happen for leap year only, I dig deeper into it and it seems 
> that  Spark is using the java.text.SimpleDateFormat and try to parse the 
> expression here
> [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652]
> {code:java}
> formatter.parse(
>  t.asInstanceOf[UTF8String].toString).getTime / 1000L{code}
>  but fail and SimpleDateFormat unable to parse the date throw Unparseable 
> Exception but Spark handle it silently and returns NULL.
>  
> *Spark-3.0:* I did some tests where spark no longer using the legacy 
> java.text.SimpleDateFormat but java date/time API, it seems  date/time API 
> expect a valid date with valid format
>  org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to