Re: spark infers date to be timestamp type

2016-10-27 Thread Steve Loughran
CSV type inference isn't really ideal: it does a full scan of a file to determine this; you are doubling the amount of data you need to read. Unless you are just exploring files in your notebook, I'd recommend doing it once, getting the schema from it then using that as the basis for the code

Re: spark infers date to be timestamp type

2016-10-26 Thread Hyukjin Kwon
Hi Koert, Sorry, I thought you meant this is a regression between 2.0.0 and 2.0.1. I just checked It has not been supporting to infer DateType before[1]. Yes, it only supports to infer such data as timestamps currently. [1]

Re: spark infers date to be timestamp type

2016-10-26 Thread Anand Viswanathan
Hi, you can use the customSchema(for DateType) and specify dateFormat in .option(). or at spark dataframe side, you can convert the timestamp to date using cast to the column. Thanks and regards, Anand Viswanathan > On Oct 26, 2016, at 8:07 PM, Koert Kuipers wrote: > >

Re: spark infers date to be timestamp type

2016-10-26 Thread Koert Kuipers
hey, i create a file called test.csv with contents: date 2015-01-01 2016-03-05 next i run this code in spark 2.0.1: spark.read .format("csv") .option("header", true) .option("inferSchema", true) .load("test.csv") .printSchema the result is: root |-- date: timestamp (nullable = true)

Re: spark infers date to be timestamp type

2016-10-26 Thread Hyukjin Kwon
There are now timestampFormat for TimestampType and dateFormat for DateType. Do you mind if I ask to share your codes? On 27 Oct 2016 2:16 a.m., "Koert Kuipers" wrote: > is there a reason a column with dates in format -mm-dd in a csv file > is inferred to be

spark infers date to be timestamp type

2016-10-26 Thread Koert Kuipers
is there a reason a column with dates in format -mm-dd in a csv file is inferred to be TimestampType and not DateType? thanks! koert