After I set the shuffle parallelism i can able to complete the job without
failure but there is one more challenge to reduce the GC time.Currently it
is taking 20 to 30% per task from overall run time.

I have to test with GC with extra java options by tomorrow.

My goal is to do the update on 25 billion rows span across 100 days of
partitions with 240 million records(2GB size) in  each partition with 50%
update on previous day partition and rest spread across remaining 99 days.

Currently it is taking 30 to 40 mins for  just to write into 1
partition.out of this 20 to 30% time goes to GC.

If we can do this in less than one to 2 hours(incremental update : 240
million daily) after tuning all the memory and other parameters i would be
very happy.




On Fri, Jul 19, 2019 at 12:19 AM Amarnath Venkataswamy <
[email protected]> wrote:

> yes.I am looking for the same thing only.
>
> On Thu, Jul 18, 2019 at 9:20 PM Vinoth Chandar <[email protected]> wrote:
>
>> No real reason. If you notice a sample configuration is  presented under
>> “gc tuning” section and asks the user to add it to extraJavaOptions. Its
>> separate coz its for cms and someone else may want to do g1
>>
>> On Thu, Jul 18, 2019 at 5:26 PM Gary Li <[email protected]> wrote:
>>
>> > One related question. The GC tuning part says [must] use G1/CMS
>> collector,
>> > but the recommended production config doesn’t specify any GC. Is there a
>> > reason behind this?
>> >
>> > On Thu, Jul 18, 2019 at 9:37 AM Vinoth Chandar <[email protected]>
>> wrote:
>> >
>> > > https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide
>> > > https://hudi.apache.org/performance.html
>> > > are good resources for what you need.
>> > >
>> > > On Thu, Jul 18, 2019 at 7:37 AM Amarnath Venkataswamy <
>> > > [email protected]> wrote:
>> > >
>> > > > Hi
>> > > >
>> > > > Can you anyone of you share the Spark configuration used at UBER I
>> > didn't
>> > > > save that link to my favorites.
>> > > >
>> > > > I am currently doing some performance test against 240million
>> records
>> > and
>> > > > job is failing for one or other reasons with memory.
>> > > >
>> > > > Regards
>> > > > Amarnath
>> > > >
>> > >
>> >
>>
>

Reply via email to