Hi Akira, from the article, it's not clear to me what they mean by saying "sophisticated features". It is true that the container assignment code path is very complicated and understanding it takes quite a bit of time and effort. So in order to speed up container assignment in large clusters, it might be necessary to rewrite that, also losing certain features in the process - but what those might be is not elaborated. But they didn't take this path and instead opted for multiple Hadoop clusters.
Since they didn't share profiling results or heat maps, we can only guess what part of Capacity Scheduler is deemed slow or a possible bottleneck. Peter On Thu, Aug 12, 2021 at 9:48 AM Akira Ajisaka <aajis...@apache.org> wrote: > Hi folks, > > I read Uber's article > https://eng.uber.com/cost-efficient-big-data-platform/. This article > is very interesting for me, and now I have some questions. > > > For example, we identified that the Capacity Scheduler has some complex > logic that slows down task assignment. However, code changes to get rid of > those won’t be able to merge into Apache Hadoop trunk, since those > sophisticated features may be needed by other companies. > > - What are those sophisticated features in the Capacity Scheduler? > - In the future, can we turn off the features by some flags in Apache > Hadoop? > - Is there any other examples like this? > > Thanks and regards, > Akira > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >