[
https://issues.apache.org/jira/browse/SPARK-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496579#comment-15496579
]
Hyukjin Kwon commented on SPARK-17545:
--------------------------------------
Thank you both for your feedback. I am okay with fixing this and supporting
some common quirky cases in general.
However, I'd like to note that we might have to avoid supporting other quirky
cases being handled in {{DateTimeUtils.stringToTime}}.
More specifically, we should avoid using
{{DatatypeConverter.parseDateTime(...)}} because an issue was identified in
that - https://github.com/apache/spark/pull/14279#issuecomment-233887751
If we only allow the strict ISO 8601 format by default and other cases by
{{timestampFormat}} and {{dateFormat}}, the problematic call above would not be
called but if we allow other quirky cases in that, this will introduce
potential problems from 2.0 too.
I left the usages only for backward compatibilities and would like to avoid
adding a new logic in that personally.
To cut this short, I am okay with adding this case in that if we fix the issue
above together or if this case is pretty much common.
Otherwise, I'd like to stay against this (although I am not supposed to decide
what should be added into Spark) and rather promote the use of
{{timestampFormat}} and {{dateFormat}}.
> Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset
> -----------------------------------------------------------------------
>
> Key: SPARK-17545
> URL: https://issues.apache.org/jira/browse/SPARK-17545
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Nathan Beyer
>
> When parsing a CSV with a date/time column that contains a variant ISO 8601
> that doesn't include a colon in the offset, casting to Timestamp fails.
> Here's a simple, example CSV content.
> {quote}
> time
> "2015-07-20T15:09:23.736-0500"
> "2015-07-20T15:10:51.687-0500"
> "2015-11-21T23:15:01.499-0600"
> {quote}
> Here's the stack trace that results from processing this data.
> {quote}
> 16/09/14 15:22:59 ERROR Utils: Aborting task
> java.lang.IllegalArgumentException: 2015-11-21T23:15:01.499-0600
> at
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown
> Source)
> at
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown
> Source)
> at
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.<init>(Unknown
> Source)
> at
> org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown
> Source)
> at
> javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
> at
> javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
> at
> javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
> at
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:140)
> at
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287)
> {quote}
> Somewhat related, I believe Python standard libraries can produce this form
> of zone offset. The system I got the data from is written in Python.
> https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]