[ 
https://issues.apache.org/jira/browse/SPARK-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354655#comment-15354655
 ] 

Giuliano Caliari commented on SPARK-16239:
------------------------------------------

The issue seems to be that Date is being converted using the daylight savings. 
It happens even when converting to Unix timestamp. Check the difference between 
2014-10-04 and 2014-10-05.

{code:title=Date.scala|borderStyle=solid}
import org.apache.spark.sql.types.TimestampType
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window._

val dates = 
Array("2014-10-03","2014-10-04","2014-10-05","2014-10-06","2014-10-07","2015-04-03","2015-04-04","2015-04-05","2015-04-06","2015-10-02","2015-10-03",
 "2015-10-04")
val wSpec = orderBy("txn_date")
val df = sc.parallelize(dates)
            .toDF("txn_date")
            .select(
                col("txn_date").cast("Date")
            )
            .select(
                col("txn_date"),
                
unix_timestamp(col("txn_date")).cast("Long").alias("txn_date_millis"),
                
from_unixtime(unix_timestamp(col("txn_date"))).alias("txn_date_millis_date")
            )
            .select(
                col("*"),
                ((col("txn_date_millis") - 
lag("txn_date_millis",1).over(wSpec)) / 60 / 
60).alias("previous_day_diff_in_hours")
            )
            .show()
    {code}

Results:
{noformat}
+----------+---------------+--------------------+--------------------------+
|  txn_date|txn_date_millis|txn_date_millis_date|previous_day_diff_in_hours|
+----------+---------------+--------------------+--------------------------+
|2014-10-03|     1412258400| 2014-10-03 00:00:00|                      null|
|2014-10-04|     1412344800| 2014-10-04 00:00:00|                      24.0|
|2014-10-05|     1412427600| 2014-10-04 23:00:00|                      23.0|
|2014-10-06|     1412514000| 2014-10-06 00:00:00|                      24.0|
|2014-10-07|     1412600400| 2014-10-07 00:00:00|                      24.0|
|2015-04-03|     1427979600| 2015-04-03 00:00:00|                    4272.0|
|2015-04-04|     1428066000| 2015-04-04 00:00:00|                      24.0|
|2015-04-05|     1428156000| 2015-04-05 01:00:00|                      25.0|
|2015-04-06|     1428242400| 2015-04-06 00:00:00|                      24.0|
|2015-10-02|     1443708000| 2015-10-02 00:00:00|                    4296.0|
|2015-10-03|     1443794400| 2015-10-03 00:00:00|                      24.0|
|2015-10-04|     1443877200| 2015-10-03 23:00:00|                      23.0|
+----------+---------------+--------------------+--------------------------+
{noformat}


> SQL issues with cast from date to string around daylight savings time
> ---------------------------------------------------------------------
>
>                 Key: SPARK-16239
>                 URL: https://issues.apache.org/jira/browse/SPARK-16239
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Glen Maisey
>            Priority: Critical
>
> Hi all,
> I have a dataframe with a date column. When I cast to a string using the 
> spark sql cast function it converts it to the wrong date on certain days. 
> Looking into it, it occurs once a year when summer daylight savings starts.
> I've tried to show this issue the code below. The toString() function works 
> correctly whereas the cast does not.
> Unfortunately my users are using SQL code rather than scala dataframes and 
> therefore this workaround does not apply. This was actually picked up where a 
> user was writing something like "SELECT date1 UNION ALL select date2" where 
> date1 was a string and date2 was a date type. It must be implicitly 
> converting the date to a string which gives this error.
> I'm in the Australia/Sydney timezone (see the time changes here 
> http://www.timeanddate.com/time/zone/australia/sydney) 
> val dates = 
> Array("2014-10-03","2014-10-04","2014-10-05","2014-10-06","2015-10-02","2015-10-03",
>  "2015-10-04", "2015-10-05")
> val df = sc.parallelize(dates)
>             .toDF("txn_date")
>             .select(col("txn_date").cast("Date"))
> df.select(
>         col("txn_date"),
>         col("txn_date").cast("Timestamp").alias("txn_date_timestamp"),
>         col("txn_date").cast("String").alias("txn_date_str_cast"),
>         col("txn_date".toString()).alias("txn_date_str_toString")
>         )
>     .show()
> +----------+--------------------+-----------------+---------------------+
> |  txn_date|  txn_date_timestamp|txn_date_str_cast|txn_date_str_toString|
> +----------+--------------------+-----------------+---------------------+
> |2014-10-03|2014-10-02 14:00:...|       2014-10-03|           2014-10-03|
> |2014-10-04|2014-10-03 14:00:...|       2014-10-04|           2014-10-04|
> |2014-10-05|2014-10-04 13:00:...|       2014-10-04|           2014-10-05|
> |2014-10-06|2014-10-05 13:00:...|       2014-10-06|           2014-10-06|
> |2015-10-02|2015-10-01 14:00:...|       2015-10-02|           2015-10-02|
> |2015-10-03|2015-10-02 14:00:...|       2015-10-03|           2015-10-03|
> |2015-10-04|2015-10-03 13:00:...|       2015-10-03|           2015-10-04|
> |2015-10-05|2015-10-04 13:00:...|       2015-10-05|           2015-10-05|
> +----------+--------------------+-----------------+---------------------+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to