Alexey Kudinkin created HUDI-5363:
-------------------------------------

             Summary: Remove default parallelism values for all ops
                 Key: HUDI-5363
                 URL: https://issues.apache.org/jira/browse/HUDI-5363
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin
            Assignee: Alexey Kudinkin
             Fix For: 0.13.0


Currently, we always override the parallelism of the incoming datasets:
 # If user specified shuffle parallelism explicitly, we'd use it to override 
the original one
 # If user did NOT specify shuffle parallelism, we'd use default value of 200

Second case is problematic: we're blindly overriding "natural" parallelism of 
the data (determined based on the source of the data) and replace it with 
static unrelated value.

Instead, we should only be overriding the parallelism in following cases:
 # User provided an overriding value explicitly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to