[
https://issues.apache.org/jira/browse/HUDI-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-5716:
---------------------------------
Labels: pull-request-available (was: )
> Fix Partitioners to avoid assuming that parallelism is always present
> ---------------------------------------------------------------------
>
> Key: HUDI-5716
> URL: https://issues.apache.org/jira/browse/HUDI-5716
> Project: Apache Hudi
> Issue Type: Bug
> Components: writer-core
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.13.1
>
>
> Currently, `Partitioner` impls assume that there's always going to be some
> parallelism level.
> This has not been issue previously for the following reasons:
> * RDDs always have inherent "parallelism" level defined as the # of
> partitions they operating upon. However for Dataset (SparkPlan) that's not
> necessarily the case (som SparkPlans might not be reporting the output
> partitioning)
> * Additionally, we have had the default parallelism level set in our configs
> before which meant that we'd prefer that over the actual incoming dataset.
> However, since we've recently removed default parallelism value from our
> configs we now need to fix Partitioners to make sure these are not assuming
> that parallelism is always going to be present.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)