Raymond Xu created HUDI-5261:
--------------------------------

             Summary: Use proper parallelism for engine context APIs
                 Key: HUDI-5261
                 URL: https://issues.apache.org/jira/browse/HUDI-5261
             Project: Apache Hudi
          Issue Type: Improvement
          Components: performance
            Reporter: Raymond Xu
             Fix For: 0.12.2


do a global search of these APIs
- org.apache.hudi.common.engine.HoodieEngineContext#flatMap
- org.apache.hudi.common.engine.HoodieEngineContext#map

and similar ones take in parallelism.

A lot of occurrences are using number of items as parallelism, which affect 
performance. Parallelism should be based on num cores available in the cluster 
and set by user via parallelism configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to