Re: spark infers date to be timestamp type

2016-10-26 Thread Anand Viswanathan
Hi, you can use the customSchema(for DateType) and specify dateFormat in .option(). or at spark dataframe side, you can convert the timestamp to date using cast to the column. Thanks and regards, Anand Viswanathan > On Oct 26, 2016, at 8:07 PM, Koert Kuipers <ko...@tresata.com&

Re: driver OOM - need recommended memory for driver

2016-09-19 Thread Anand Viswanathan
gt; I guess my assumption that "default resources (memory and cores) can handle any application" is wrong. Thanks and regards, Anand Viswanathan > On Sep 19, 2016, at 6:56 PM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > If you make your driver memory

Re: driver OOM - need recommended memory for driver

2016-09-19 Thread Anand Viswanathan
Thank you so much, Kevin. My data size is around 4GB. I am not using collect(), take() or takeSample() At the final job, number of tasks grows up to 200,000 Still the driver crashes with OOM with default —driver-memory 1g but Job succeeds if i specify 2g. Thanks and regards, Anand Viswanathan

driver OOM - need recommended memory for driver

2016-09-19 Thread Anand Viswanathan
Hi, Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark. I am running a machine learning program, which runs perfectly by specifying 2G for —driver-memory. However the program cannot be run with default 1G, driver crashes with OOM error. What is the recommended configuration for