Alexey Kudinkin created HUDI-5363:
-------------------------------------
Summary: Remove default parallelism values for all ops
Key: HUDI-5363
URL: https://issues.apache.org/jira/browse/HUDI-5363
Project: Apache Hudi
Issue Type: Bug
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin
Fix For: 0.13.0
Currently, we always override the parallelism of the incoming datasets:
# If user specified shuffle parallelism explicitly, we'd use it to override
the original one
# If user did NOT specify shuffle parallelism, we'd use default value of 200
Second case is problematic: we're blindly overriding "natural" parallelism of
the data (determined based on the source of the data) and replace it with
static unrelated value.
Instead, we should only be overriding the parallelism in following cases:
# User provided an overriding value explicitly
--
This message was sent by Atlassian Jira
(v8.20.10#820010)