Re: Article: Cost-Efficient Open Source Big Data Platform at Uber

Peter Bacsko Wed, 18 Aug 2021 08:05:26 -0700

Hi Akira,

from the article, it's not clear to me what they mean by saying
"sophisticated features". It is true that the container assignment code
path is very complicated and understanding it takes quite a bit of time and
effort. So in order to speed up container assignment in large clusters, it
might be necessary to rewrite that, also losing certain features in the
process - but what those might be is not elaborated. But they didn't take
this path and instead opted for multiple Hadoop clusters.


Since they didn't share profiling results or heat maps, we can only guess
what part of Capacity Scheduler is deemed slow or a possible bottleneck.

Peter

On Thu, Aug 12, 2021 at 9:48 AM Akira Ajisaka <[email protected]> wrote:

> Hi folks,
>
> I read Uber's article
> https://eng.uber.com/cost-efficient-big-data-platform/. This article
> is very interesting for me, and now I have some questions.
>
> > For example, we identified that the Capacity Scheduler has some complex
> logic that slows down task assignment. However, code changes to get rid of
> those won’t be able to merge into Apache Hadoop trunk, since those
> sophisticated features may be needed by other companies.
>
> - What are those sophisticated features in the Capacity Scheduler?
> - In the future, can we turn off the features by some flags in Apache
> Hadoop?
> - Is there any other examples like this?
>
> Thanks and regards,
> Akira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Article: Cost-Efficient Open Source Big Data Platform at Uber

Reply via email to