Alexey Kudinkin created HUDI-5716:
-------------------------------------
Summary: Fix Partitioners to avoid assuming that parallelism is
always present
Key: HUDI-5716
URL: https://issues.apache.org/jira/browse/HUDI-5716
Project: Apache Hudi
Issue Type: Bug
Components: writer-core
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin
Fix For: 0.13.1
Currently, `Partitioner` impls assume that there's always going to be some
parallelism level.
This has not been issue previously for the following reasons:
* RDDs always have inherent "parallelism" level defined as the # of partitions
they operating upon. However for Dataset (SparkPlan) that's not necessarily the
case (som SparkPlans might not be reporting the output partitioning)
* Additionally, we have had the default parallelism level set in our configs
before which meant that we'd prefer that over the actual incoming dataset.
However, since we've recently removed default parallelism value from our
configs we now need to fix Partitioners to make sure these are not assuming
that parallelism is always going to be present.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)