[
https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645041#comment-17645041
]
Jonathan Vexler commented on HUDI-5261:
---------------------------------------
TimelineServerPerf has numExecuters with a default of 10
But then also has numCoresPerExecutor also with a default of 10
Something seems off here. Maybe it's supposed to be numExecutors per core?
Whatever it is, those configs seem to conflict
> Use proper parallelism for engine context APIs
> ----------------------------------------------
>
> Key: HUDI-5261
> URL: https://issues.apache.org/jira/browse/HUDI-5261
> Project: Apache Hudi
> Issue Type: Improvement
> Components: performance
> Reporter: Raymond Xu
> Assignee: Jonathan Vexler
> Priority: Critical
> Fix For: 0.12.2
>
>
> do a global search of these APIs
> - org.apache.hudi.common.engine.HoodieEngineContext#flatMap
> - org.apache.hudi.common.engine.HoodieEngineContext#map
> and similar ones take in parallelism.
> A lot of occurrences are using number of items as parallelism, which affect
> performance. Parallelism should be based on num cores available in the
> cluster and set by user via parallelism configs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)