Hey I didn’t know this Bolke, I was under the impression of the same as Ruslan. Thanks for the share
Sent from my iPhone > On Apr 24, 2018, at 2:12 PM, Bolke de Bruin <[email protected]> wrote: > > It actually can nowadays: > https://cdn.oreillystatic.com/en/assets/1/event/269/HDFS%20on%20Kubernetes_%20Tech%20deep%20dive%20on%20locality%20and%20security%20Presentation.pptx > > We also have an on premise setup with ceph (s3a) and HDFS for when we need > the speed and kubernetes for our workloads. We are kicking out Yarn (and hive > etc for that matter). > > Bolke > > > > Verstuurd vanaf mijn iPad > >> Op 24 apr. 2018 om 22:50 heeft Ruslan Dautkhanov <[email protected]> het >> volgende geschreven: >> >> Kubernetes is a "monolithic" 1-level scheduler that can't handle what YARN >> can - for example schedule tasks local to data. >> Hadoop has multiple levels of data locality (node-local, rack-local) - so >> computation happens local to data to minimize network >> data transfer which is expensive. >> K8s wasn't designed to handle this scheduling scenarios, as far as I know. >> >> For cloud deployments where we don't have data locality problem (because of >> s3 is being used instead of storage local >> to servers), k8s might be okay. >> >> Nice comparison [1] of k8s vs two-level schedulers like yarn and messos .. >> although I think it's an offtopic. >> >> We're mostly on-prem and we don't see kubernetes take over yarn any time >> soon. >> >> Thanks. >> >> >> >> [1] >> >> https://aaltodoc.aalto.fi/bitstream/handle/123456789/27061/master_Ravula_Shashi_2017.pdf?sequence=1 >> >> *2.3.2 Monolithic Schedulers * >> >> >> >> Monolithic schedulers use a single, centralized scheduling algorithm for >> all jobs. All workload is run through the same scheduler and same >> scheduling logic. Swarm, >> Fleet, Borg and Kubernetes adopt monolithic schedulers. Kubernetes >> improvised on basic monolithic version of Borg and Swarm schedulers. This >> type of schedulers are not suitable for running heterogeneous modern >> workloads which include Spark jobs, containers, and other long running jobs, >> etc. >> >> >> >> *2.3.3 Two Level Schedulers * >> >> >> >> Two-level schedulers address the drawbacks of a monolithic scheduler by >> separating concerns of resource allocation and task placement. An active >> resource manager offers compute resources to multiple parallel, independent >> “scheduler frameworks”. The Mesos cluster manager pioneered this approach, >> and YARN supports a limited version of it. In Mesos, resources are offered >> to application-level schedulers. This allows for custom, workload-specific >> scheduling policies. The drawback with this type of scheduling architecture >> is that the application level frameworks cannot see all the possible >> placement options anymore. Instead, they only see those options that >> correspond to resources offered (Mesos) or allocated (YARN) by the resource >> manager component. This makes priority preemption (higher priority tasks >> kick out lower priority ones) difficult. >> >> >> >> >> >> -- >> Ruslan Dautkhanov >> >>> On Tue, Apr 24, 2018 at 2:22 PM, Bolke de Bruin <[email protected]> wrote: >>> >>> Happy to have it as a contrib executor. However, I personally think yarn >>> is a dead end. It has a lot of catching up to do and all the momentum is >>> with kubernetes. >>> >>> B. >>> >>> Verstuurd vanaf mijn iPad >>> >>>> Op 24 apr. 2018 om 22:13 heeft Ruslan Dautkhanov <[email protected]> >>> het volgende geschreven: >>>> >>>> With Hadoop 3's Docker on YARN support, I think YARN becomes >>>> somewhat a competitor for Kubernetes. >>>> >>>> Great job on adding k8s support to Airflow. >>>> >>>> Very similarly I see Airflow could integrate with YARN and use >>>> its infrastructure as an "executor" .. have anyone explored feasibility >>> of >>>> this approach? >>>> >>>> >>>> Thanks! >>>> Ruslan Dautkhanov >>>
