CSV type inference isn't really ideal: it does a full scan of a file to determine this; you are doubling the amount of data you need to read. Unless you are just exploring files in your notebook, I'd recommend doing it once, getting the schema from it then using that as the basis for the code snippet where you really define the schema. That's when you can explicitly declare the schema types if the inferred ones aren't great.
(maybe I should write something which prints out the scala/py code for that declaration rather than having to do it by hand...) On 27 Oct 2016, at 05:55, Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote: Hi Koert, Sorry, I thought you meant this is a regression between 2.0.0 and 2.0.1. I just checked It has not been supporting to infer DateType before[1]. Yes, it only supports to infer such data as timestamps currently. [1]https://github.com/apache/spark/blob/branch-2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala#L85-L92 2016-10-27 9:12 GMT+09:00 Anand Viswanathan <anand_v...@ymail.com<mailto:anand_v...@ymail.com>>: Hi, you can use the customSchema(for DateType) and specify dateFormat in .option(). or at spark dataframe side, you can convert the timestamp to date using cast to the column. Thanks and regards, Anand Viswanathan On Oct 26, 2016, at 8:07 PM, Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>> wrote: hey, i create a file called test.csv with contents: date 2015-01-01 2016-03-05 next i run this code in spark 2.0.1: spark.read .format("csv") .option("header", true) .option("inferSchema", true) .load("test.csv") .printSchema the result is: root |-- date: timestamp (nullable = true) On Wed, Oct 26, 2016 at 7:35 PM, Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote: There are now timestampFormat for TimestampType and dateFormat for DateType. Do you mind if I ask to share your codes? On 27 Oct 2016 2:16 a.m., "Koert Kuipers" <ko...@tresata.com<mailto:ko...@tresata.com>> wrote: is there a reason a column with dates in format yyyy-mm-dd in a csv file is inferred to be TimestampType and not DateType? thanks! koert