[ 
https://issues.apache.org/jira/browse/SPARK-21613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110719#comment-16110719
 ] 

Florentino Sainz edited comment on SPARK-21613 at 8/2/17 1:19 PM:
------------------------------------------------------------------

I've updated the sample Notebook to show this use case, its not working on a 
"simple" WHERE

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4270191918950108/1394884673460602/1201575579760496/latest.html

{code}
df = sqlContext.createDataFrame([("20-AUG-2017", 
"11:00:18.111"),("20-AUG-2017", "20:11:18.111"), ("20-AUG-2017", 
"03:11:18.222"),
("20-JAN-2017", "20:11:18.222"),("20-JAN-2017", "03:11:18.222"),("20-AUG-2018", 
"03:11:18.222")], ('tick_date', 'tick_time'))
df.registerTempTable("test")


sqlContext.sql('SELECT *, unix_timestamp( tick_date,"dd-MMM-YYYY") as fecha 
from TEST WHERE 1483228800=unix_timestamp( tick_date,"dd-MMM-YYYY")').show()
{code}

AS you can see, all those 4-5 of those timestamps match 1483228800 (which is 
like 1-JAN-2017, and none of my dates are like that)


was (Author: florentino):
One addition, on my main development, this unix_timestamp is in the WHERE 
clause (for now it's much simpler) and it seems to work fine there... problem 
appears only when selecting the column

> Wrong unix_timestamp when parsing Dates
> ---------------------------------------
>
>                 Key: SPARK-21613
>                 URL: https://issues.apache.org/jira/browse/SPARK-21613
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.6.1, 2.1.0
>            Reporter: Florentino Sainz
>
> Hi,
> I'm trying to perform some SQL queries on my database and I'm getting very 
> weird behaviour:
> sqlContext.sql('SELECT date, time, 
> to_utc_timestamp(from_unixtime(unix_timestamp( CONCAT (date, " ", 
> time),"dd-MMM-YYYY HH:mm:ss")),"Europe/Madrid") FROM MYTABLE WHERE YEAR=2017  
> AND MONTH=01 LIMIT 10').show();
> By doing this table, doesn't matter the month, we are always getting the same 
> TIMESTAMP (corresponding 01/01/2017). As if month and day were not being 
> processed at all. I've tested it @ 1.6, 2.1 (at my company's cluster and at 
> Databricks). I also tested by setting a wrong month (aka JJA) and it gives 
> NULL, so it is getting processed at some point.
> I've isolated the problem with the following snippet (not using Spark SQL 
> directly):
> I also tested using numeric Months (first part of my snippet), just in case, 
> but no difference at all.
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4270191918950108/1394884673460602/1201575579760496/latest.html
> {code}
> from pyspark.sql.functions import concat, col, lit, 
> from_utc_timestamp,to_utc_timestamp, unix_timestamp, from_unixtime
> df = sqlContext.createDataFrame([ ("11-08-2016", "20:11:18.222"),  
> ("20-11-2016", "20:11:18.222")], ('Date', 'Time'))
> df.select('Date',unix_timestamp(timestamp=col("Date"),format="dd-MM-YYYY")).show()
> df.select('Date',to_utc_timestamp(from_unixtime(unix_timestamp(concat(col("Date"),
>  lit(" "), col("Time")),"dd-MM-YYYY HH:mm:ss")),"Europe/Madrid")).show()
> df = sqlContext.createDataFrame([("20-SEP-2017", "20:00:18.111"), 
> ("20-OCT-2017", "20:11:18.222"),  ("11-AUG-2016", "20:11:18.222"),  
> ("20-OCT-2016", "20:11:18.222")], ('Date', 'Time'))
> df.select('Date',unix_timestamp(timestamp=col("Date"),format="dd-MMM-YYYY")).show()
> df.select('Date',to_utc_timestamp(from_unixtime(unix_timestamp(concat(col("Date"),
>  lit(" "), col("Time")),"dd-MMM-YYYY HH:mm:ss")),"Europe/Madrid")).show()
> {code}
> Apart from the bug report, anyone can guess a workaround? (I'm trying to use 
> DF as my datasize is quite big and I want to prevent serializing my items 
> back into python at this stage)
> Thanks,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to