Raymond Xu created HUDI-5261:
--------------------------------
Summary: Use proper parallelism for engine context APIs
Key: HUDI-5261
URL: https://issues.apache.org/jira/browse/HUDI-5261
Project: Apache Hudi
Issue Type: Improvement
Components: performance
Reporter: Raymond Xu
Fix For: 0.12.2
do a global search of these APIs
- org.apache.hudi.common.engine.HoodieEngineContext#flatMap
- org.apache.hudi.common.engine.HoodieEngineContext#map
and similar ones take in parallelism.
A lot of occurrences are using number of items as parallelism, which affect
performance. Parallelism should be based on num cores available in the cluster
and set by user via parallelism configs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)