Re: Writing DataFrame filter results to separate files

2016-12-06 Thread Everett Anderson
On Mon, Dec 5, 2016 at 5:33 PM, Michael Armbrust wrote: > 1. In my case, I'd need to first explode my data by ~12x to assign each >> record to multiple 12-month rolling output windows. I'm not sure Spark SQL >> would be able to optimize this away, combining it with the

RE: Writing DataFrame filter results to separate files

2016-12-05 Thread Mendelson, Assaf
Subject: Re: Writing DataFrame filter results to separate files 1. In my case, I'd need to first explode my data by ~12x to assign each record to multiple 12-month rolling output windows. I'm not sure Spark SQL would be able to optimize this away, combining it with the output writing to do

Re: Writing DataFrame filter results to separate files

2016-12-05 Thread Michael Armbrust
> > 1. In my case, I'd need to first explode my data by ~12x to assign each > record to multiple 12-month rolling output windows. I'm not sure Spark SQL > would be able to optimize this away, combining it with the output writing > to do it incrementally. > You are right, but I wouldn't worry

Re: Writing DataFrame filter results to separate files

2016-12-05 Thread Everett Anderson
Hi, Thanks for the reply! On Mon, Dec 5, 2016 at 1:30 PM, Michael Armbrust wrote: > If you repartition($"column") and then do .write.partitionBy("column") you > should end up with a single file for each value of the partition column. > I have two concerns there: 1. In

Re: Writing DataFrame filter results to separate files

2016-12-05 Thread Michael Armbrust
If you repartition($"column") and then do .write.partitionBy("column") you should end up with a single file for each value of the partition column. On Mon, Dec 5, 2016 at 10:59 AM, Everett Anderson wrote: > Hi, > > I have a DataFrame of records with dates, and I'd like