[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

Hyukjin Kwon (JIRA) Thu, 30 Mar 2017 18:21:53 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950149#comment-15950149
 ]


Hyukjin Kwon commented on SPARK-20152:
--------------------------------------

I think the correct usage is as below:

{code}
scala> new 
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z")
res15: java.util.Date = Tue Mar 21 09:00:00 KST 2017
{code}

I should have left some comments there maybe. At that time I introduce this in 
SPARK-16216, I used {{ZZ}} as specified in {{FastDateFormat}} to support "ISO 
8601 extended format time zones" (see 
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/FastDateFormat.html).
 I am sorry I kind of tend to trust Apache ones more ... maybe I had to use 
{{SimpleDateFormat}} with thread-local instead.

After this gets merged, I realised it seems {{FastDateFormat}} has a bug about 
supporting {{XXX}} format specified in 
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html -
 https://issues.apache.org/jira/browse/LANG-1101 and it seems fixed in 3.4.

IIRC, I used this format for that reason and the commons-lang3 version was 
3.3.2 at that time. After few months, in favour of SPARK-17985, it is bumped up 
and now it should be fixed and I think you can use {{XXX}} as below:

{code}
scala> import org.apache.commons.lang3.time.FastDateFormat
import org.apache.commons.lang3.time.FastDateFormat

scala> 
FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z")
res0: java.util.Date = Tue Mar 21 09:00:00 KST 2017
{code}

The related test was added in commons here - 
https://github.com/apache/commons-lang/commit/bdb074610c87a210ea4c0d91d579cb4558f4b19f

To cut this short, I think this issue is resolvable, and I think we can replace 
the default format to {{XXX}} by default now instead of {{ZZ}} which is 
{{FastDateFormat}}-specific up to my knowledge.


> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20152
>                 URL: https://issues.apache.org/jira/browse/SPARK-20152
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-yyyy'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-yyyy'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-yyyy'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

Reply via email to