[
https://issues.apache.org/jira/browse/SPARK-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419073#comment-15419073
]
Barry Becker commented on SPARK-17039:
--------------------------------------
There are literal ?'s in the datafile. The "nullValue" option indicates that
those ?'s should be read as null values. I also added the "dateFormat" option
which describes how the dates in the file should be read.
Let me try to provide more information so you can reproduce.
Here is the schema that I am specifiying (dfSchema above):
{code}
StructType(StructField(string normal,StringType,true),
StructField(Years,TimestampType,true), StructField(Months,TimestampType,true),
StructField(WeekDays,TimestampType,true), StructField(Days,TimestampType,true),
StructField(DaysWithNull,TimestampType,true),
StructField(Hours,TimestampType,true), StructField(Minutes,TimestampType,true),
StructField(normal dates,TimestampType,true), StructField(Wide Range
Dates,TimestampType,true), StructField(Narrow,TimestampType,true),
StructField(Far Future,TimestampType,true), StructField(Mostly
Null,TimestampType,true), StructField(All Same Date,TimestampType,true),
StructField(Past/Future,TimestampType,true), StructField(All
nulls,TimestampType,true), StructField(Seconds,TimestampType,true))
{code}
and here is the contents of the csv datafile (note that there are lots of
nulls). This worked using databricks spark-csv lib as a dependency in spark
1.6.2
{code}
foo 2015-03-09T00:00:00 2015-03-09T00:00:00 2015-03-09T00:00:00
2015-03-09T00:00:00 2015-03-09T00:00:00 2015-03-09T00:00:00
2015-03-09T00:01:00 2007-11-09T00:00:00 1967-11-09T00:00:00
2015-03-09T12:00:00 2700-01-01T00:00:00 2015-03-09T00:00:00
2015-03-09T00:00:00 1983-03-09T00:00:00 ? 2015-03-09T12:01:00
bar 2016-03-09T00:00:00 2015-04-09T00:00:00 2015-03-10T00:00:00
2015-03-10T00:00:00 ? 2015-03-09T01:00:00 2015-03-09T00:03:00
2007-10-02T00:00:00 1987-10-02T00:00:00 2015-03-09T12:03:00
3701-01-01T00:00:00 2015-04-09T00:00:00 2015-03-09T00:00:00
1865-04-09T00:00:00 ? 2015-03-09T12:01:01
baz 2017-03-09T00:00:00 2015-05-09T00:00:00 2015-03-11T00:00:00
2015-03-11T00:00:00 2015-03-11T00:00:00 2015-03-09T02:00:00
2015-03-09T00:05:00 1999-04-04T03:00:00 1999-02-03T00:00:00
2015-03-09T12:08:00 4702-01-01T00:00:00 ? 2015-03-09T00:00:00
1777-05-09T00:00:00 ? 2015-03-09T12:01:03
but 2018-03-09T00:00:00 2015-06-09T00:00:00 2015-03-12T00:00:00
2015-03-12T00:00:00 2015-03-12T00:00:00 2015-03-09T03:00:00
2015-03-09T00:08:00 2025-10-10T00:00:00 2025-10-10T00:00:00
2015-03-09T12:10:00 4103-01-01T00:00:00 2015-06-09T00:00:00
2015-03-09T00:00:00 2089-06-09T00:00:00 ? 2015-03-09T12:01:05
fooo 2019-03-09T00:00:00 2015-07-09T00:00:00 2015-03-13T00:00:00
2015-03-13T00:00:00 2015-03-13T00:00:00 2015-03-09T04:00:00
2015-03-09T00:09:00 2004-02-23T00:00:00 2004-02-23T00:00:00
2015-03-09T12:15:00 4204-01-01T00:00:00 ? 2015-03-09T00:00:00
2125-07-09T00:00:00 ? 2015-03-09T12:01:07
bar 2020-03-09T00:00:00 2015-08-09T00:00:00 2015-03-16T00:00:00
2015-03-14T00:00:00 2015-03-14T00:00:00 2015-03-09T05:00:00
2015-03-09T00:12:00 2019-03-04T00:00:00 3019-03-04T00:00:00
2015-03-09T12:20:00 4305-01-01T00:00:00 2015-08-09T00:00:00
2015-03-09T00:00:00 2215-08-09T00:00:00 ? 2015-03-09T12:01:09
baz 2021-03-09T00:00:00 2015-09-09T00:00:00 2015-03-17T00:00:00
2015-03-15T00:00:00 2015-03-15T00:00:00 2015-03-09T06:00:00
2015-03-09T00:20:00 1999-04-04T02:34:00 ? 2015-03-09T12:25:00
4406-01-01T00:00:00 2015-09-09T00:00:00 2015-03-09T00:00:00
1754-09-09T00:00:00 ? 2015-03-09T12:01:11
but 2022-03-09T00:00:00 2015-10-09T00:00:00 2015-03-18T00:00:00
2015-03-16T00:00:00 ? 2015-03-09T07:00:00 2015-03-09T00:30:00
1999-03-01T00:00:00 1909-03-01T00:00:00 2015-03-09T12:30:00
4507-01-01T00:00:00 ? 2015-03-09T00:00:00 1958-10-09T00:00:00
? 2015-03-09T12:01:00
bar 2023-03-09T00:00:00 2015-11-09T00:00:00 2015-03-19T00:00:00
2015-03-17T00:00:00 2015-03-17T00:00:00 2015-03-09T08:00:00
2015-03-09T00:35:00 2001-02-12T00:00:00 ? 2015-03-09T12:35:00
4608-01-01T00:00:00 2015-11-09T00:00:00 2015-03-09T00:00:00
3000-11-09T00:00:00 ? 2015-03-09T12:01:00
here is a really really really long string value 2024-03-09T00:00:00
2015-12-09T00:00:00 2015-03-20T00:00:00 2015-03-18T00:00:00
2015-03-18T00:00:00 2015-03-09T09:00:00 2015-03-09T00:40:00
1999-04-04T17:17:00 1999-01-15T00:00:00 2015-03-09T12:40:00
4709-01-01T00:00:00 2015-12-09T00:00:00 2015-03-09T00:00:00
4015-12-09T00:00:00 ? 2015-03-09T12:01:00
foo 2025-03-09T00:00:00 2016-01-09T00:00:00 2015-03-23T00:00:00
2015-03-19T00:00:00 2015-03-19T00:00:00 2015-03-09T10:00:00
2015-03-09T00:41:00 1999-02-28T00:00:00 1999-02-28T00:00:00
2015-03-09T12:45:00 4710-01-01T00:00:00 2016-01-09T00:00:00
2015-03-09T00:00:00 2000-01-09T00:00:00 ? 2015-03-09T12:01:00
bar 2026-03-09T00:00:00 2016-02-09T00:00:00 2015-03-24T00:00:00
2015-03-20T00:00:00 2015-03-20T00:00:00 2015-03-09T11:00:00
2015-03-09T00:42:00 1999-04-04T14:14:00 2999-01-17T00:00:00
2015-03-09T12:55:00 4811-01-01T00:00:00 ? 2015-03-09T00:00:00
1999-02-09T00:00:00 ? 2015-03-09T12:01:00
bar 2027-03-09T00:00:00 2016-03-09T00:00:00 2015-03-25T00:00:00
2015-03-21T00:00:00 2015-03-21T00:00:00 2015-03-09T12:00:00
2015-03-09T00:43:00 2015-03-07T10:10:00 2999-06-04T00:00:00
2015-03-09T12:59:00 4912-01-01T00:00:00 2016-03-09T00:00:00
2015-03-09T00:00:00 1856-03-09T00:00:00 ? 2015-03-09T12:01:00
foo 2028-03-09T00:00:00 2016-04-09T00:00:00 2015-03-26T00:00:00
2015-03-22T00:00:00 2015-03-22T00:00:00 2015-03-09T13:00:00
2015-03-09T00:44:00 ? ? ? ? ? ? ?
? 2015-03-09T12:01:00
bar 2029-03-09T00:00:00 2016-05-09T00:00:00 2015-03-27T00:00:00
2015-03-23T00:00:00 2015-03-23T00:00:00 2015-03-09T14:00:00
2015-03-09T00:46:00 2007-11-08T00:00:00 1907-11-09T00:00:00
2015-03-09T12:00:00 3700-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? 2015-03-09T12:01:00
baz 2030-03-09T00:00:00 2016-06-09T00:00:00 2015-03-30T00:00:00
2015-03-24T00:00:00 2015-03-24T00:00:00 2015-03-09T15:00:00
2015-03-09T00:47:00 2007-10-03T00:00:00 1919-10-02T00:00:00
2015-03-09T12:03:00 4701-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? 2015-03-09T12:01:00
foo 2031-03-09T00:00:00 2016-07-09T00:00:00 2015-03-31T00:00:00
2015-03-25T00:00:00 ? 2015-03-09T16:00:00 2015-03-09T00:48:00
1999-04-06T03:00:00 2000-02-03T00:00:00 2015-03-09T12:08:00
4602-01-01T00:00:00 ? 2015-03-09T00:00:00 ? ?
2015-03-09T12:01:00
foo 2032-03-09T00:00:00 2016-08-09T00:00:00 2015-04-01T00:00:00
2015-03-26T00:00:00 2015-03-26T00:00:00 2015-03-09T17:00:00
2015-03-09T00:49:00 2025-10-12T00:00:00 2025-10-10T00:00:00
2015-03-09T12:10:00 4213-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? 2015-03-09T12:01:00
but 2033-03-09T00:00:00 2016-09-09T00:00:00 2015-04-02T00:00:00
2015-03-27T00:00:00 2015-03-27T00:00:00 2015-03-09T18:00:00
2015-03-09T00:51:00 2004-02-20T00:00:00 2014-02-23T00:00:00
2015-03-09T12:15:00 4304-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? 2015-03-09T12:01:00
foo 2034-03-09T00:00:00 2016-10-09T00:00:00 2015-04-03T00:00:00
2015-03-28T00:00:00 2015-03-28T00:00:00 2015-03-09T19:00:00
2015-03-09T00:52:00 2019-03-05T00:00:00 3019-03-04T00:00:00
2015-03-09T12:20:00 4405-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? 2015-03-09T12:01:00
foo 2035-03-09T00:00:00 2016-11-09T00:00:00 2015-04-06T00:00:00
2015-03-29T00:00:00 2015-03-29T00:00:00 2015-03-09T20:00:00
2015-03-09T00:54:00 1999-04-05T02:39:00 ? 2015-03-09T12:25:00
4506-01-01T00:00:00 ? 2015-03-09T00:00:00 ? ?
2015-03-09T12:01:00
foo 2036-03-09T00:00:00 2016-12-09T00:00:00 2015-04-07T00:00:00
2015-03-30T00:00:00 2015-03-30T00:00:00 2015-03-09T21:00:00
2015-03-09T00:55:00 1999-03-03T00:00:00 1911-03-02T00:00:00
2015-03-09T12:30:00 4607-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? 2015-03-09T12:01:00
foo 2037-03-09T00:00:00 2017-01-09T00:00:00 2015-04-08T00:00:00
2015-03-31T00:00:00 2015-03-31T00:00:00 2015-03-09T22:00:00
2015-03-09T00:57:00 2001-02-14T00:00:00 ? 2015-03-09T12:35:00
4618-01-01T00:00:00 ? 2015-03-09T00:00:00 ? ?
2015-03-09T12:01:00
foo 2038-03-09T00:00:00 2017-02-09T00:00:00 2015-04-09T00:00:00
2015-04-01T00:00:00 2015-04-01T00:00:00 2015-03-09T23:00:00
2015-03-09T00:59:00 1999-04-07T16:16:00 1999-01-14T00:00:00
2015-03-09T12:40:00 4659-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? 2015-03-09T12:01:28
foo 2039-03-09T00:00:00 2017-03-09T00:00:00 2015-04-10T00:00:00
2015-04-02T00:00:00 2015-04-02T00:00:00 2015-03-09T00:00:00 ?
1999-02-27T00:00:00 1999-02-25T00:00:00 2015-03-09T12:44:00
4612-01-01T00:00:00 ? 2015-03-09T00:00:00 ? ? ?
foo 2040-03-09T00:00:00 2017-04-09T00:00:00 2015-04-13T00:00:00
2015-04-03T00:00:00 2015-04-03T00:00:00 2015-03-10T01:00:00
2015-03-09T00:54:00 1999-04-03T14:14:00 2999-01-12T00:00:00
2015-03-09T12:54:00 4821-01-01T00:00:00 ? 2015-03-09T00:00:00
? ? ?
foo 2041-03-09T00:00:00 2017-05-09T00:00:00 2015-04-14T00:00:00
2015-04-04T00:00:00 2015-04-04T00:00:00 2015-03-10T02:00:00 ?
2015-03-06T10:10:00 2999-06-03T00:00:00 2015-03-09T12:58:00
5912-01-01T00:00:00 ? 2015-03-09T00:00:00 ? ? ?
bar 2042-03-09T00:00:00 2017-06-09T00:00:00 2015-04-15T00:00:00
2015-04-05T00:00:00 2015-04-05T00:00:00 2015-03-11T03:00:00
2015-03-09T00:54:00 ? ? ? ? ? ? ?
? ?
bar 2043-03-09T00:00:00 2017-07-09T00:00:00 2015-04-16T00:00:00
2015-04-06T00:00:00 ? 2015-03-10T04:00:00 ? ? ?
? ? ? ? ? ? ?
bar 2044-03-09T00:00:00 2017-08-09T00:00:00 2015-04-17T00:00:00
2015-04-07T00:00:00 2015-04-07T00:00:00 2015-03-11T05:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2045-03-09T00:00:00 2017-09-09T00:00:00 2015-04-20T00:00:00
2015-04-08T00:00:00 2015-04-08T00:00:00 2015-03-10T06:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2046-03-09T00:00:00 2017-10-09T00:00:00 2015-04-21T00:00:00
2015-04-09T00:00:00 2015-04-09T00:00:00 2015-03-11T07:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2047-03-09T00:00:00 2017-11-09T00:00:00 2015-04-22T00:00:00
2015-04-10T00:00:00 2015-04-10T00:00:00 2015-03-10T08:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2048-03-09T00:00:00 2017-12-09T00:00:00 2015-04-23T00:00:00
2015-04-11T00:00:00 2015-04-11T00:00:00 2015-03-11T09:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2049-03-09T00:00:00 2018-01-09T00:00:00 2015-04-24T00:00:00
2015-04-12T00:00:00 2015-04-12T00:00:00 2015-03-10T10:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2050-03-09T00:00:00 2018-02-09T00:00:00 2015-04-27T00:00:00
2015-04-13T00:00:00 2015-04-13T00:00:00 2015-03-11T11:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2051-03-09T00:00:00 2018-03-09T00:00:00 2015-04-28T00:00:00
2015-04-14T00:00:00 2015-04-14T00:00:00 2015-03-10T12:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2052-03-09T00:00:00 2018-04-09T00:00:00 2015-04-29T00:00:00
2015-04-15T00:00:00 2015-04-15T00:00:00 2015-03-10T13:00:00 ?
? ? ? ? ? ? ? ? ?
bar 2053-03-09T00:00:00 2018-05-09T00:00:00 2015-04-30T00:00:00
2015-04-16T00:00:00 2015-04-16T00:00:00 2015-03-10T14:00:00 ?
? ? ? ? ? ? ? ? ?
? 2054-03-09T00:00:00 2018-06-09T00:00:00 2015-05-01T00:00:00
2015-04-17T00:00:00 2015-04-17T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2055-03-09T00:00:00 2018-07-09T00:00:00 2015-05-04T00:00:00
2015-04-18T00:00:00 2015-04-18T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2056-03-09T00:00:00 2018-08-09T00:00:00 2015-05-05T00:00:00
2015-04-19T00:00:00 ? ? ? ? ? ? ?
? ? ? ? ?
? 2057-03-09T00:00:00 2018-09-09T00:00:00 2015-05-06T00:00:00
2015-04-20T00:00:00 2015-04-20T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2058-03-09T00:00:00 2018-10-09T00:00:00 2015-05-07T00:00:00
2015-04-21T00:00:00 2015-04-21T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2059-03-09T00:00:00 2018-11-09T00:00:00 2015-05-08T00:00:00
2015-04-22T00:00:00 2015-04-22T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2060-03-09T00:00:00 2018-12-09T00:00:00 2015-05-11T00:00:00
2015-04-23T00:00:00 2015-04-23T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2061-03-09T00:00:00 2019-01-09T00:00:00 2015-05-12T00:00:00
2015-04-24T00:00:00 2015-04-24T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2062-03-09T00:00:00 2019-02-09T00:00:00 2015-05-13T00:00:00
2015-04-25T00:00:00 2015-04-25T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? 2063-03-09T00:00:00 2019-03-09T00:00:00 2015-05-14T00:00:00
2015-04-26T00:00:00 2015-04-26T00:00:00 ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?
{code}
> cannot read null dates from csv file
> ------------------------------------
>
> Key: SPARK-17039
> URL: https://issues.apache.org/jira/browse/SPARK-17039
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 2.0.0
> Reporter: Barry Becker
>
> I see this exact same bug as reported in this [stack overflow
> post|http://stackoverflow.com/questions/38265640/spark-2-0-pre-csv-parsing-error-if-missing-values-in-date-column]
> using Spark 2.0.0 (released version).
> In scala, I read a csv using
> sqlContext.read
> .format("csv")
> .option("header", "false")
> .option("inferSchema", "false")
> .option("nullValue", "?")
> .option("dateFormat", "yyyy-MM-dd'T'HH:mm:ss")
> .schema(dfSchema)
> .csv(dataFile)
> The data contains some null dates (represented with ?).
> The error I get is:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0
> (TID 10, localhost): java.text.ParseException: Unparseable date: "?"
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]