Re: Airflow - YARN as an executor?

Ace Haidrey Tue, 24 Apr 2018 14:21:13 -0700

Hey I didn’t know this Bolke, I was under the impression of the same as Ruslan.
Thanks for the share


Sent from my iPhone

> On Apr 24, 2018, at 2:12 PM, Bolke de Bruin <[email protected]> wrote:
> 
> It actually can nowadays: 
> https://cdn.oreillystatic.com/en/assets/1/event/269/HDFS%20on%20Kubernetes_%20Tech%20deep%20dive%20on%20locality%20and%20security%20Presentation.pptx
> 
> We also have an on premise setup with ceph (s3a) and HDFS for when we need 
> the speed and kubernetes for our workloads. We are kicking out Yarn (and hive 
> etc for that matter).
> 
> Bolke
> 
> 
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 24 apr. 2018 om 22:50 heeft Ruslan Dautkhanov <[email protected]> het 
>> volgende geschreven:
>> 
>> Kubernetes is a "monolithic" 1-level scheduler that can't handle what YARN
>> can - for example schedule tasks local to data.
>> Hadoop has multiple levels of data locality (node-local, rack-local) - so
>> computation happens local to data to minimize network
>> data transfer which is expensive.
>> K8s wasn't designed to handle this scheduling scenarios, as far as I know.
>> 
>> For cloud deployments where we don't have data locality problem (because of
>> s3 is being used instead of storage local
>> to servers), k8s might be okay.
>> 
>> Nice comparison [1] of k8s vs two-level schedulers like yarn and messos ..
>> although I think it's an offtopic.
>> 
>> We're mostly on-prem and we don't see kubernetes take over yarn any time
>> soon.
>> 
>> Thanks.
>> 
>> 
>> 
>> [1]
>> 
>> https://aaltodoc.aalto.fi/bitstream/handle/123456789/27061/master_Ravula_Shashi_2017.pdf?sequence=1
>> 
>> *2.3.2 Monolithic Schedulers *
>> 
>> 
>> 
>> Monolithic schedulers use a single, centralized scheduling algorithm for
>> all jobs. All workload is run through the same scheduler and same
>> scheduling logic. Swarm,
>> Fleet, Borg and Kubernetes adopt monolithic schedulers. Kubernetes
>> improvised on basic monolithic version of Borg and Swarm schedulers. This
>> type of schedulers are not suitable for running heterogeneous modern
>> workloads which include Spark jobs, containers, and other long running jobs,
>> etc.
>> 
>> 
>> 
>> *2.3.3 Two Level Schedulers *
>> 
>> 
>> 
>> Two-level schedulers address the drawbacks of a monolithic scheduler by
>> separating concerns of resource allocation and task placement. An active
>> resource manager offers compute resources to multiple parallel, independent
>> “scheduler frameworks”. The Mesos cluster manager pioneered this approach,
>> and YARN supports a limited version of it. In Mesos, resources are offered
>> to application-level schedulers. This allows for custom, workload-specific
>> scheduling policies. The drawback with this type of scheduling architecture
>> is that the application level frameworks cannot see all the possible
>> placement options anymore. Instead, they only see those options that
>> correspond to resources offered (Mesos) or allocated (YARN) by the resource
>> manager component. This makes priority preemption (higher priority tasks
>> kick out lower priority ones) difficult.
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Ruslan Dautkhanov
>> 
>>> On Tue, Apr 24, 2018 at 2:22 PM, Bolke de Bruin <[email protected]> wrote:
>>> 
>>> Happy to have it as a contrib executor. However, I personally think yarn
>>> is a dead end. It has a lot of catching up to do and all the momentum is
>>> with kubernetes.
>>> 
>>> B.
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 24 apr. 2018 om 22:13 heeft Ruslan Dautkhanov <[email protected]>
>>> het volgende geschreven:
>>>> 
>>>> With Hadoop 3's Docker on YARN support, I think YARN becomes
>>>> somewhat a competitor for Kubernetes.
>>>> 
>>>> Great job on adding k8s support to Airflow.
>>>> 
>>>> Very similarly I see Airflow could integrate with YARN and use
>>>> its infrastructure as an "executor" .. have anyone explored feasibility
>>> of
>>>> this approach?
>>>> 
>>>> 
>>>> Thanks!
>>>> Ruslan Dautkhanov
>>>

Re: Airflow - YARN as an executor?

Reply via email to