Thanks!
I will take a look
Andy
From: Gourav Sengupta
Date: Tuesday, January 11, 2022 at 8:42 AM
To: Andrew Davidson
Cc: Andrew Davidson , "user @spark"
Subject: Re: How to add a row number column with out reordering my data frame
Hi,
I do not think we need to do any of that. Please try
Hi,
I do not think we need to do any of that. Please try repartitionbyrange,
dpark 3 has adaptive query execution with configurations to handle skew as
well.
Regards,
Gourav
On Tue, Jan 11, 2022 at 4:21 PM Andrew Davidson wrote:
> HI Gourav
>
>
>
> When I join I get OOM. To address this my
HI Gourav
When I join I get OOM. To address this my thought was to split my tables into
small batches of rows. And then join the batch together then use union. My
assumption is the union is a narrow transform and as such require fewer
resources. Let say I have 5 data frames I want to join
Hi,
I am a bit confused here, it is not entirely clear to me why are you
creating the row numbers, and how creating the row numbers helps you with
the joins?
Can you please explain with some sample data?
Regards,
Gourav
On Fri, Jan 7, 2022 at 1:14 AM Andrew Davidson
wrote:
> Hi
>
>
>
> I am
Hi
I am trying to work through a OOM error. I have 10411 files. I want to select a
single column from each file and then join them into a single table.
The files have a row unique id. However it is a very long string. The data file
with just the name and column of interest is about 470 M. The