[GitHub] [hudi] yihua commented on pull request #9722: [HUDI-6863] Revert auto-tuning of dedup parallelism

via GitHub Fri, 15 Sep 2023 10:08:38 -0700


yihua commented on PR #9722:
URL: https://github.com/apache/hudi/pull/9722#issuecomment-1721593449


   > Lets revisit the problems 6802 was tackliing. Main issue it was addressing 
is, making our shuffle parallelism dynamic and relative to the incoming df's 
num partitions. So, if someone is running 1000s of pipelines, they don't need 
to statically set the right value for shuffle parallelism for each of the 1000 
pipelines.
   > 
   > can you help me understand whats the issue we are hitting that warrants us 
to revert it? also, this would mean that we are going back to old state where 
we expect users to explicitly configure the shuffle parallelism. If so, do we 
have a plan around dynamically choosing the right shuffle partition value 
depending on incoming batch?
   
   This PR does not revert the dynamic determination of the shuffle 
parallelism.  The decided target shuffle parallelism is passed in with "`int 
parallelism`" through `deduplicateRecords`.  Without the revert, the user loses 
the ability to override the parallelism through the shuffle parallelism configs 
because `parallelism` can be ignored inside this method and the rest of the 
write DAG uses the new parallelism.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on pull request #9722: [HUDI-6863] Revert auto-tuning of dedup parallelism

Reply via email to