CSV type inference isn't really ideal: it does a full scan of a file to
determine this; you are doubling the amount of data you need to read. Unless
you are just exploring files in your notebook, I'd recommend doing it once,
getting the schema from it then using that as the basis for the code
Hi Koert,
Sorry, I thought you meant this is a regression between 2.0.0 and 2.0.1. I
just checked It has not been supporting to infer DateType before[1].
Yes, it only supports to infer such data as timestamps currently.
[1]
Hi,
you can use the customSchema(for DateType) and specify dateFormat in .option().
or
at spark dataframe side, you can convert the timestamp to date using cast to
the column.
Thanks and regards,
Anand Viswanathan
> On Oct 26, 2016, at 8:07 PM, Koert Kuipers wrote:
>
>
hey,
i create a file called test.csv with contents:
date
2015-01-01
2016-03-05
next i run this code in spark 2.0.1:
spark.read
.format("csv")
.option("header", true)
.option("inferSchema", true)
.load("test.csv")
.printSchema
the result is:
root
|-- date: timestamp (nullable = true)
There are now timestampFormat for TimestampType and dateFormat for DateType.
Do you mind if I ask to share your codes?
On 27 Oct 2016 2:16 a.m., "Koert Kuipers" wrote:
> is there a reason a column with dates in format -mm-dd in a csv file
> is inferred to be
is there a reason a column with dates in format -mm-dd in a csv file is
inferred to be TimestampType and not DateType?
thanks! koert