On Mon, Dec 5, 2016 at 5:33 PM, Michael Armbrust
wrote:
> 1. In my case, I'd need to first explode my data by ~12x to assign each
>> record to multiple 12-month rolling output windows. I'm not sure Spark SQL
>> would be able to optimize this away, combining it with the
Subject: Re: Writing DataFrame filter results to separate files
1. In my case, I'd need to first explode my data by ~12x to assign each record
to multiple 12-month rolling output windows. I'm not sure Spark SQL would be
able to optimize this away, combining it with the output writing to do
>
> 1. In my case, I'd need to first explode my data by ~12x to assign each
> record to multiple 12-month rolling output windows. I'm not sure Spark SQL
> would be able to optimize this away, combining it with the output writing
> to do it incrementally.
>
You are right, but I wouldn't worry
Hi,
Thanks for the reply!
On Mon, Dec 5, 2016 at 1:30 PM, Michael Armbrust
wrote:
> If you repartition($"column") and then do .write.partitionBy("column") you
> should end up with a single file for each value of the partition column.
>
I have two concerns there:
1. In
If you repartition($"column") and then do .write.partitionBy("column") you
should end up with a single file for each value of the partition column.
On Mon, Dec 5, 2016 at 10:59 AM, Everett Anderson
wrote:
> Hi,
>
> I have a DataFrame of records with dates, and I'd like
Hi,
I have a DataFrame of records with dates, and I'd like to write all
12-month (with overlap) windows to separate outputs.
Currently, I have a loop equivalent to:
for ((windowStart, windowEnd) <- windows) {
val windowData = allData.filter(
getFilterCriteria(windowStart,