[I] Fix Partitioners to avoid assuming that parallelism is always present [hudi]

via GitHub Sat, 29 Nov 2025 22:27:55 -0800


hudi-bot opened a new issue, #15764:
URL: https://github.com/apache/hudi/issues/15764


   Currently, `Partitioner` impls assume that there's always going to be some 
parallelism level.
   
   This has not been issue previously for the following reasons:
    * RDDs always have inherent "parallelism" level defined as the # of 
partitions they operating upon. However for Dataset (SparkPlan) that's not 
necessarily the case (som SparkPlans might not be reporting the output 
partitioning)
    * Additionally, we have had the default parallelism level set in our 
configs before which meant that we'd prefer that over the actual incoming 
dataset.
   
   However, since we've recently removed default parallelism value from our 
configs we now need to fix Partitioners to make sure these are not assuming 
that parallelism is always going to be present.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-5716
   - Type: Bug
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Fix Partitioners to avoid assuming that parallelism is always present [hudi]

Reply via email to