subject:"Writing DataFrame filter results to separate files"

Re: Writing DataFrame filter results to separate files

2016-12-06 Thread Everett Anderson

On Mon, Dec 5, 2016 at 5:33 PM, Michael Armbrust wrote: > 1. In my case, I'd need to first explode my data by ~12x to assign each >> record to multiple 12-month rolling output windows. I'm not sure Spark SQL >> would be able to optimize this away, combining it with the

RE: Writing DataFrame filter results to separate files

2016-12-05 Thread Mendelson, Assaf

Subject: Re: Writing DataFrame filter results to separate files 1. In my case, I'd need to first explode my data by ~12x to assign each record to multiple 12-month rolling output windows. I'm not sure Spark SQL would be able to optimize this away, combining it with the output writing to do

Re: Writing DataFrame filter results to separate files

2016-12-05 Thread Michael Armbrust

> > 1. In my case, I'd need to first explode my data by ~12x to assign each > record to multiple 12-month rolling output windows. I'm not sure Spark SQL > would be able to optimize this away, combining it with the output writing > to do it incrementally. > You are right, but I wouldn't worry

Re: Writing DataFrame filter results to separate files

2016-12-05 Thread Everett Anderson

Hi, Thanks for the reply! On Mon, Dec 5, 2016 at 1:30 PM, Michael Armbrust wrote: > If you repartition($"column") and then do .write.partitionBy("column") you > should end up with a single file for each value of the partition column. > I have two concerns there: 1. In

Re: Writing DataFrame filter results to separate files

2016-12-05 Thread Michael Armbrust

If you repartition($"column") and then do .write.partitionBy("column") you should end up with a single file for each value of the partition column. On Mon, Dec 5, 2016 at 10:59 AM, Everett Anderson wrote: > Hi, > > I have a DataFrame of records with dates, and I'd like

Writing DataFrame filter results to separate files

2016-12-05 Thread Everett Anderson

Hi, I have a DataFrame of records with dates, and I'd like to write all 12-month (with overlap) windows to separate outputs. Currently, I have a loop equivalent to: for ((windowStart, windowEnd) <- windows) { val windowData = allData.filter( getFilterCriteria(windowStart,

Re: Writing DataFrame filter results to separate files

RE: Writing DataFrame filter results to separate files

Re: Writing DataFrame filter results to separate files

Re: Writing DataFrame filter results to separate files

Re: Writing DataFrame filter results to separate files

Writing DataFrame filter results to separate files

6 matches

Site Navigation

Mail list logo

Footer information