[ 
https://issues.apache.org/jira/browse/SPARK-18072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18072.
-------------------------------
    Resolution: Cannot Reproduce

> empty/null Timestamp field
> --------------------------
>
>                 Key: SPARK-18072
>                 URL: https://issues.apache.org/jira/browse/SPARK-18072
>             Project: Spark
>          Issue Type: Question
>          Components: Input/Output
>    Affects Versions: 2.0.0
>         Environment: hadoop 2.7.1, ubuntu 15.10, databricks 1.5,  spark-csv 
> 1.5.0, scala 2.11.8
>            Reporter: marcin pekalski
>              Labels: csvparser, parquet, parquetWriter, timestamp
>
> I was asked by [~falaki] to create a jira here, previously it was reported as 
> databricks' issue on github 
> https://github.com/databricks/spark-csv/issues/388#issuecomment-255631718
> I have problem with spark 2.0.0, spark-csv 1.5.0, and scala 2.11.8.
> I have a csv file that I want to convert to parquet. There is a column with 
> timestamps and some of them are missing, those are empty strings (without 
> quotes, and it is not even a spacer, just new line straightaway as that is 
> the last column). I get exception thrown:
> {code}
> 16/10/23 02:46:08 ERROR Utils: Aborting task
> java.lang.IllegalArgumentException
>       at java.sql.Date.valueOf(Date.java:143)
>       at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:137)
>       at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287)
>       at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:115)
>       at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:84)
> ...
> {code}
> The options I use when reading csv
> {code}
> "delimiter" -> ","
> "header" -> "true"
> "inferSchema" -> "true"
> "treatEmptyValuesAsNulls" ->"true"
> "nullValue"->""
> {code}
> The execution goes through *CSVINferSchema.scala* (lines 284-287) in 
> **spark-sql_2.11-2.0.0-sources.jar**
> {code}
>       case _: TimestampType =>
>         // This one will lose microseconds parts.
>         // See https://issues.apache.org/jira/browse/SPARK-10681.
>         DateTimeUtils.stringToTime(datum).getTime  * 1000L
> {code}
> it invokes `Date.valueOf(s)` in *DateTimeUtils.scala* 
> *spark-catalyst_2.11-2.0.0-sources.jar* that then throws excepion in 
> *java.sql.Date.valueOf*. 
> Is that a bug, I am doing something wrong, or there is a way to pass a 
> default value?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to