[ 
https://issues.apache.org/jira/browse/SPARK-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509110#comment-15509110
 ] 

Hyukjin Kwon commented on SPARK-17545:
--------------------------------------

Yes. This is because we introduced {{FastDateFormat}} there with default 
pattern, {{yyyy-MM-dd'T'HH:mm:ss.SSSZZ}}.

{code}
scala> import org.apache.commons.lang3.time.FastDateFormat
import org.apache.commons.lang3.time.FastDateFormat

scala> val f = FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSSZZ")
f: org.apache.commons.lang3.time.FastDateFormat = 
FastDateFormat[yyyy-MM-dd'T'HH:mm:ss.SSSZZ,ko_KR,Asia/Seoul]

scala> f.parse("2015-11-21T23:15:01.499-0600")
res0: java.util.Date = Sun Nov 22 14:15:01 KST 2015

scala> f.parse("2015-11-21T23:15:01.499-06:00")
res1: java.util.Date = Sun Nov 22 14:15:01 KST 2015
{code}

It works also at end-to-end test - 
https://github.com/apache/spark/pull/15147#issuecomment-247903603.

In more details,

the actual conversion is happening in 
https://github.com/apache/spark/blob/1dbb725dbef30bf7633584ce8efdb573f2d92bca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala#L265-L273

{code}
Try(options.timestampFormat.parse(datum).getTime * 1000L)
  .getOrElse {
    // If it fails to parse, then tries the way used in 2.0 and 1.x for 
backwards
    // compatibility.
    DateTimeUtils.stringToTime(datum).getTime * 1000L
  }
{code}

Before https://github.com/apache/spark/pull/14279, it was
https://github.com/apache/spark/blob/e1dc853737fc1739fbb5377ffe31fb2d89935b1f/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala#L287

{code}
DateTimeUtils.stringToTime(datum).getTime  * 1000L
{code}

It is true {{DateTimeUtils.stringToTime(...)}} does not handle {{+0800}} case 
but after https://github.com/apache/spark/pull/14279, we are trying 
{{FastDateFormat}} first which seems covering this case.

> Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset
> -----------------------------------------------------------------------
>
>                 Key: SPARK-17545
>                 URL: https://issues.apache.org/jira/browse/SPARK-17545
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nathan Beyer
>
> When parsing a CSV with a date/time column that contains a variant ISO 8601 
> that doesn't include a colon in the offset, casting to Timestamp fails.
> Here's a simple, example CSV content.
> {quote}
> time
> "2015-07-20T15:09:23.736-0500"
> "2015-07-20T15:10:51.687-0500"
> "2015-11-21T23:15:01.499-0600"
> {quote}
> Here's the stack trace that results from processing this data.
> {quote}
> 16/09/14 15:22:59 ERROR Utils: Aborting task
> java.lang.IllegalArgumentException: 2015-11-21T23:15:01.499-0600
>       at 
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(Unknown 
> Source)
>       at 
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(Unknown 
> Source)
>       at 
> org.apache.xerces.jaxp.datatype.XMLGregorianCalendarImpl.<init>(Unknown 
> Source)
>       at 
> org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(Unknown
>  Source)
>       at 
> javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
>       at 
> javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
>       at 
> javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
>       at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:140)
>       at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287)
> {quote}
> Somewhat related, I believe Python standard libraries can produce this form 
> of zone offset. The system I got the data from is written in Python.
> https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to