vinishjail97 commented on PR #10872:
URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105950888
> hey @vinishjail97 : can you attach the memory profileing you did before
and after this patch. and rebase w/ master. we are good to go
15th March: Basic OOM Test (Consume 2M events, each payload is approximately
1KB with 2 maxExecutors and 1GB memory) and dynamic allocation ratio was 0.002
so essentially only 1 executor will be used as tasks spawned are not enough.
driver:
coreLimit: 2000m
coreRequest: 1800m
cores: 2
labels:
orgId: 0c043996-9e42-4904-95b9-f98918ebeda4
version: 3.1.1
memory: 2g
serviceAccount: staging-spark
dynamicAllocation:
enabled: true
initialExecutors: 0
maxExecutors: 2
minExecutors: 0
executor:
coreLimit: 1000m
coreRequest: 750m
cores: 1
labels:
orgId: 0c043996-9e42-4904-95b9-f98918ebeda4
version: 3.1.1
memory: 1g
Without the fix, the stage was failing with executor OOM after 20min.

After using this fix, the same stage completed in 17min with one executor.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]