[
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950149#comment-15950149
]
Hyukjin Kwon commented on SPARK-20152:
--------------------------------------
I think the correct usage is as below:
{code}
scala> new
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z")
res15: java.util.Date = Tue Mar 21 09:00:00 KST 2017
{code}
I should have left some comments there maybe. At that time I introduce this in
SPARK-16216, I used {{ZZ}} as specified in {{FastDateFormat}} to support "ISO
8601 extended format time zones" (see
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/FastDateFormat.html).
I am sorry I kind of tend to trust Apache ones more ... maybe I had to use
{{SimpleDateFormat}} with thread-local instead.
After this gets merged, I realised it seems {{FastDateFormat}} has a bug about
supporting {{XXX}} format specified in
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html -
https://issues.apache.org/jira/browse/LANG-1101 and it seems fixed in 3.4.
IIRC, I used this format for that reason and the commons-lang3 version was
3.3.2 at that time. After few months, in favour of SPARK-17985, it is bumped up
and now it should be fixed and I think you can use {{XXX}} as below:
{code}
scala> import org.apache.commons.lang3.time.FastDateFormat
import org.apache.commons.lang3.time.FastDateFormat
scala>
FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z")
res0: java.util.Date = Tue Mar 21 09:00:00 KST 2017
{code}
The related test was added in commons here -
https://github.com/apache/commons-lang/commit/bdb074610c87a210ea4c0d91d579cb4558f4b19f
To cut this short, I think this issue is resolvable, and I think we can replace
the default format to {{XXX}} by default now instead of {{ZZ}} which is
{{FastDateFormat}}-specific up to my knowledge.
> Time zone is not respected while parsing csv for timeStampFormat
> "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
> ----------------------------------------------------------------------------------------------
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the
> "timestampFormat": "MM-dd-yyyy'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File:
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-yyyy'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but
> expected result is TimeCoumn should be of "TimestampType" and should
> consider time zone for manipulation
> Source code2:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-yyyy'T'HH:mm:ss")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but
> expected result is TimeCoumn should consider time zone for manipulation
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]