Kubernetes is a "monolithic" 1-level scheduler that can't handle what YARN can - for example schedule tasks local to data. Hadoop has multiple levels of data locality (node-local, rack-local) - so computation happens local to data to minimize network data transfer which is expensive. K8s wasn't designed to handle this scheduling scenarios, as far as I know.
For cloud deployments where we don't have data locality problem (because of s3 is being used instead of storage local to servers), k8s might be okay. Nice comparison [1] of k8s vs two-level schedulers like yarn and messos .. although I think it's an offtopic. We're mostly on-prem and we don't see kubernetes take over yarn any time soon. Thanks. [1] https://aaltodoc.aalto.fi/bitstream/handle/123456789/27061/master_Ravula_Shashi_2017.pdf?sequence=1 *2.3.2 Monolithic Schedulers * Monolithic schedulers use a single, centralized scheduling algorithm for all jobs. All workload is run through the same scheduler and same scheduling logic. Swarm, Fleet, Borg and Kubernetes adopt monolithic schedulers. Kubernetes improvised on basic monolithic version of Borg and Swarm schedulers. This type of schedulers are not suitable for running heterogeneous modern workloads which include Spark jobs, containers, and other long running jobs, etc. *2.3.3 Two Level Schedulers * Two-level schedulers address the drawbacks of a monolithic scheduler by separating concerns of resource allocation and task placement. An active resource manager offers compute resources to multiple parallel, independent “scheduler frameworks”. The Mesos cluster manager pioneered this approach, and YARN supports a limited version of it. In Mesos, resources are offered to application-level schedulers. This allows for custom, workload-specific scheduling policies. The drawback with this type of scheduling architecture is that the application level frameworks cannot see all the possible placement options anymore. Instead, they only see those options that correspond to resources offered (Mesos) or allocated (YARN) by the resource manager component. This makes priority preemption (higher priority tasks kick out lower priority ones) difficult. -- Ruslan Dautkhanov On Tue, Apr 24, 2018 at 2:22 PM, Bolke de Bruin <[email protected]> wrote: > Happy to have it as a contrib executor. However, I personally think yarn > is a dead end. It has a lot of catching up to do and all the momentum is > with kubernetes. > > B. > > Verstuurd vanaf mijn iPad > > > Op 24 apr. 2018 om 22:13 heeft Ruslan Dautkhanov <[email protected]> > het volgende geschreven: > > > > With Hadoop 3's Docker on YARN support, I think YARN becomes > > somewhat a competitor for Kubernetes. > > > > Great job on adding k8s support to Airflow. > > > > Very similarly I see Airflow could integrate with YARN and use > > its infrastructure as an "executor" .. have anyone explored feasibility > of > > this approach? > > > > > > Thanks! > > Ruslan Dautkhanov >
