nsivabalan commented on PR #9722: URL: https://github.com/apache/hudi/pull/9722#issuecomment-1721566580
Lets revisit the problems 6802 was tackliing. Main issue it was addressing is, making our shuffle parallelism dynamic and relative to the incoming df's num partitions. So, if someone is running 1000s of pipelines, they don't need to statically set the right value for shuffle parallelism for each of the 1000 pipelines. can you help me understand whats the issue we are hitting that warrants us to revert it? also, this would mean that we are going back to old state where we expect users to explicitly configure the shuffle parallelism. If so, do we have a plan around dynamically choosing the right shuffle partition value depending on incoming batch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
