[
https://issues.apache.org/jira/browse/SPARK-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509110#comment-15509110
]
Hyukjin Kwon commented on SPARK-17545:
--------------------------------------
Yes. This is because we introduced {{FastDateFormat}} there with default
pattern, {{yyyy-MM-dd'T'HH:mm:ss.SSSZZ}}.
{code}
scala> import org.apache.commons.lang3.time.FastDateFormat
import org.apache.commons.lang3.time.FastDateFormat
scala> val f = FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSSZZ")
f: org.apache.commons.lang3.time.FastDateFormat =
FastDateFormat[yyyy-MM-dd'T'HH:mm:ss.SSSZZ,ko_KR,Asia/Seoul]
scala> f.parse("2015-11-21T23:15:01.499-0600")
res0: java.util.Date = Sun Nov 22 14:15:01 KST 2015
scala> f.parse("2015-11-21T23:15:01.499-06:00")
res1: java.util.Date = Sun Nov 22 14:15:01 KST 2015
{code}
It works also at end-to-end test -
https://github.com/apache/spark/pull/15147#issuecomment-247903603.
In more details,
the actual conversion is happening in
https://github.com/apache/spark/blob/1dbb725dbef30bf7633584ce8efdb573f2d92bca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala#L265-L273
{code}
Try(options.timestampFormat.parse(datum).getTime * 1000L)
.getOrElse {
// If it fails to parse, then tries the way used in 2.0 and 1.x for
backwards
// compatibility.
DateTimeUtils.stringToTime(datum).getTime * 1000L
}
{code}
Before https://github.com/apache/spark/pull/14279, it was
https://github.com/apache/spark/blob/e1dc853737fc1739fbb5377ffe31fb2d89935b1f/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala#L287
{code}
DateTimeUtils.stringToTime(datum).getTime * 1000L
{code}
It is true {{DateTimeUtils.stringToTime(...)}} does not handle {{+0800}} case
but after https://github.com/apache/spark/pull/14279, we are trying
{{FastDateFormat}} first which seems covering this case.
> Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset
> -----------------------------------------------------------------------
>
> Key: SPARK-17545
> URL: https://issues.apache.org/jira/browse/SPARK-17545
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Nathan Beyer
>
> When parsing a CSV with a date/time column that contains a variant ISO 8601
> that doesn't include a colon in the offset, casting to Timestamp fails.
> Here's a simple, example CSV content.
> {quote}
> time
> "2015-07-20T15:09:23.736-0500"
> "2015-07-20T15:10:51.687-0500"
> "2015-11-21T23:15:01.499-0600"
> {quote}
> Here's the stack trace that results from processing this data.
> {quote}
> 16/09/14 15:22:59 ERROR Utils: Aborting task
> java.lang.IllegalArgumentException: 2015-11-21T23:15:01.499-0600
> at
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown
> Source)
> at
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown
> Source)
> at
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.<init>(Unknown
> Source)
> at
> org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown
> Source)
> at
> javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
> at
> javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
> at
> javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
> at
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:140)
> at
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287)
> {quote}
> Somewhat related, I believe Python standard libraries can produce this form
> of zone offset. The system I got the data from is written in Python.
> https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]