[GitHub] [hudi] xushiyan commented on issue #6610: [QUESTION] Faster approach for backfilling older data without stopping realtime pipelines

GitBox Wed, 14 Sep 2022 18:45:35 -0700


xushiyan commented on issue #6610:
URL: https://github.com/apache/hudi/issues/6610#issuecomment-1247472941


   is this a spark streaming job you're running ? does it scale accordingly 
when backfill traffic spiked up? the OOM also hints that you may need tune 
spark configs properly, like spark memory and spark memory.storage.fraction to 
give more execution memory.
   Looks like order of records does not matter here since you pump them into 
the same topic. Why not start a batch job just for backfill? that's how people 
usually run backfill jobs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xushiyan commented on issue #6610: [QUESTION] Faster approach for backfilling older data without stopping realtime pipelines

Reply via email to