[ https://issues.apache.org/jira/browse/SPARK-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353984#comment-15353984 ]
Glen Maisey commented on SPARK-16239: ------------------------------------- As a layman myself it is not expected that a Date type would need to specify a timezone. I would not expect that the Date types would be impacted at all by timezone or daylight savings considerations at all (I like to think of dates as an integer from a certain date in the past). I would expect this sort of behaviour if I were using a Timestamp or Datetime datatype. > SQL issues with cast from date to string around daylight savings time > --------------------------------------------------------------------- > > Key: SPARK-16239 > URL: https://issues.apache.org/jira/browse/SPARK-16239 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Glen Maisey > Priority: Critical > > Hi all, > I have a dataframe with a date column. When I cast to a string using the > spark sql cast function it converts it to the wrong date on certain days. > Looking into it, it occurs once a year when summer daylight savings starts. > I've tried to show this issue the code below. The toString() function works > correctly whereas the cast does not. > Unfortunately my users are using SQL code rather than scala dataframes and > therefore this workaround does not apply. This was actually picked up where a > user was writing something like "SELECT date1 UNION ALL select date2" where > date1 was a string and date2 was a date type. It must be implicitly > converting the date to a string which gives this error. > I'm in the Australia/Sydney timezone (see the time changes here > http://www.timeanddate.com/time/zone/australia/sydney) > val dates = > Array("2014-10-03","2014-10-04","2014-10-05","2014-10-06","2015-10-02","2015-10-03", > "2015-10-04", "2015-10-05") > val df = sc.parallelize(dates) > .toDF("txn_date") > .select(col("txn_date").cast("Date")) > df.select( > col("txn_date"), > col("txn_date").cast("Timestamp").alias("txn_date_timestamp"), > col("txn_date").cast("String").alias("txn_date_str_cast"), > col("txn_date".toString()).alias("txn_date_str_toString") > ) > .show() > +----------+--------------------+-----------------+---------------------+ > | txn_date| txn_date_timestamp|txn_date_str_cast|txn_date_str_toString| > +----------+--------------------+-----------------+---------------------+ > |2014-10-03|2014-10-02 14:00:...| 2014-10-03| 2014-10-03| > |2014-10-04|2014-10-03 14:00:...| 2014-10-04| 2014-10-04| > |2014-10-05|2014-10-04 13:00:...| 2014-10-04| 2014-10-05| > |2014-10-06|2014-10-05 13:00:...| 2014-10-06| 2014-10-06| > |2015-10-02|2015-10-01 14:00:...| 2015-10-02| 2015-10-02| > |2015-10-03|2015-10-02 14:00:...| 2015-10-03| 2015-10-03| > |2015-10-04|2015-10-03 13:00:...| 2015-10-03| 2015-10-04| > |2015-10-05|2015-10-04 13:00:...| 2015-10-05| 2015-10-05| > +----------+--------------------+-----------------+---------------------+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org