Lev created SPARK-16981:
---------------------------
Summary: For CSV files nullValue is not respected for Date/Time
data type
Key: SPARK-16981
URL: https://issues.apache.org/jira/browse/SPARK-16981
Project: Spark
Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Lev
Test case
val struct = StructType(Seq(StructField("col1", StringType,
true),StructField("col2", TimestampType, true), Seq(StructField("col3",
StringType, true)))
val cq = sqlContext.readStream
.format("csv")
.option("nullValue", " ")
.schema(struct)
.load(s"somepath")
.writeStream(....)
content of the file
"abc", ,"def"
Result:
Exception is thrown:
scala.MatchError: java.lang.IllegalArgumentException: Timestamp format must be
yyyy-mm-dd hh:mm:ss[.fffffffff] (of class java.lang.IllegalArgumentException)
Code analysis:
Problem is caused by code in castTo method of CSVTypeCast object
For all data types except temporal there is the following check:
if (datum == options.nullValue && nullable) {
null
}
But for temporal types it is missing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]