Re: [DISCUSS] Create community "Apache YuniKorn" executor ?

Amogh Desai Wed, 30 Oct 2024 03:13:43 -0700

Right!

Let me try and analyse the impact here and try to come up with a plan on
how we can
expand on this area. As Ash mentioned earlier, it doesn't have to be a
committed item, but
this is something that might call for an AIP(?) and can be worked on
outside the main tree?


Thanks & Regards,
Amogh Desai


On Tue, Oct 29, 2024 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> This all looks really good - sounds like something that we could only do in
> K8S executor and likely even make it compatible with Airflow 2 and release
> independently
>
> On Tue, Oct 29, 2024 at 1:30 PM Amogh Desai <amoghdesai....@gmail.com>
> wrote:
>
> > > As I understand what It means - If I read it correctly, that it's
> mostly
> > a
> > deployment issue - we don't even have to have YuniKorn Executor - we can
> > use K8S Executor, and it will work out of the box, with scheduling
> > controlled by YuniKorn, but then we need to find a way to configure
> > behaviour of tasks and dags (likely via annotations of pods maybe?). That
> > would mean that it's mostly a documentation on "How I can leverage
> YuniKorn
> > with Airflow" + maybe a helm chart modification to install YuniKorn as an
> > option?
> >
> > And then likely we need to add a little bit of metadata and some mapping
> of
> > "task" or "dag" or "task group" properties to open-up more capabilities
> of
> > YuniKorn Scheduling ?
> >
> > Do I understand correctly?
> >
> > You mostly summed it up. But a few things.
> > Yes, we can open up Yunikorn to schedule Airflow workloads just by doing
> > basically
> > nothing or at most very little manual work.
> >
> > But to really enable Yunikorn in full power, we will have to make some
> > changes to the
> > Airflow codebase. A few things at the top of my head:
> > The admission controller will take care of the applicationId and
> scheduler
> > name etc, but from
> > an initial read, if we want things like - "schedule dags to a certain
> queue
> > only" or something of
> > that sort, we will need some labels to be injected or even a level above,
> > get the KPO to add
> > some labels etc, like a queue.
> > OR
> > even if we can specify the queue for every operator by extending the
> > BaseOperator, that would be cool
> > too.
> >
> > I personally think if we could extend the KubernetesExecutor to
> > YunikornExecutor (naming doesn't matter
> > to me), we can handle things like installing Yunikorn along with Airflow
> by
> > making changes for helm charts,
> > make it come up with the scheduler, admission controller, etc. We will
> able
> > to make code changes for Airflow
> > by controlling the internal logic with the executor type instead of
> lending
> > it all the way to the end user (I
> > mean options like the label injection, labelling all the tasks of a group
> > as an application, to adhere to Jarek's
> > thought).
> >
> > Manikandan, feel free to add anything more from the Yunikorn side in
> case I
> > have misinterpreted or
> > just generally missed :)
> >
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> > On Tue, Oct 29, 2024 at 1:28 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > This is cool.
> > >
> > > As I understand what It means - If I read it correctly, that it's
> mostly
> > a
> > > deployment issue - we don't even have to have YuniKorn Executor - we
> can
> > > use K8S Executor, and it will work out of the box, with scheduling
> > > controlled by YuniKorn, but then we need to find a way to configure
> > > behaviour of tasks and dags (likely via annotations of pods maybe?).
> That
> > > would mean that it's mostly a documentation on "How I can leverage
> > YuniKorn
> > > with Airflow" + maybe a helm chart modification to install YuniKorn as
> an
> > > option?
> > >
> > > And then likely we need to add a little bit of metadata and some
> mapping
> > of
> > > "task" or "dag" or "task group" properties to open-up more capabilities
> > of
> > > YuniKorn Scheduling ?
> > >
> > > Do I understand correctly?
> > >
> > > > 1. Yunikorn treats applications at the DAG level not at the task
> level,
> > > > which is great. Due to this, we can try to leverag
> > > > gang scheduling abilities of Yunikorn.
> > >
> > > This is great. I was wondering if we could also allow the application
> on
> > > the "Task Group" level. I find it is a really interesting feature to be
> > > able to treat a "Task Group" as an entity that we could treat as
> > > "application" - this way you could treat the "Task Group" as
> "schedulable
> > > entity" and for example set pre-emption properties for all tasks in the
> > > same task group. Or Gang scheduling for the task group ("Only schedule
> > > tasks in the task group when there is enough resources for the whole
> task
> > > group". Or - and this is something that I think as a "holy grail" of
> > > scheduling in the context of optimisation of machine learning
> workflows:
> > > "Make sure that all the tasks in a group are scheduled on the the same
> > node
> > > and use the same local hardware resources" + if any of them fail, retry
> > the
> > > whole group - also on the same instance (I think this is partially
> > possible
> > > with some node affinity setup - but I would love if we should be able
> to
> > > set a property on a task group effectively meaning ("Execute all tasks
> in
> > > the group on the same hardware") - so a bit higher abstraction, and
> have
> > > YuniKorn handle all the pre-emption and optimisations of scheduling for
> > > that.
> > >
> > > > 2. With the admission controller running, even the older DAGs will be
> > > able
> > > > to benefit from the Yunikorn scheduling ablities
> > > >
> > > > without the need to make changes to the DAGs. This means that the
> same
> > > DAG
> > > > will run with default scheduler (K8s default)
> > >
> > > > as well as Yunikorn if need be!
> > >
> > > Fantastic!
> > >
> > > 3. As Mani mentioned, preemption capabilities can be explored due to
> this
> > > as well.
> > >
> > > I am happy to work on this effort and looking forward to it.
> > >
> > > > Yeah that would be cool - also see above, I think if we will be able
> to
> > > have some "light touch" integration with Yunikorn, where we could
> handle
> > > "Task Group" as schedulable entity + have some higher level
> abstractions
> > /
> > > properties of it that would map into some "scheduling behaviour" -
> > > preemption/gang scheduling and document it, that would be great and
> easy
> > > way of expanding Airflow capabilities - especially for ML workflows.
> > >
> > > J.
> > >
> > >
> > > On Tue, Oct 29, 2024 at 8:10 AM Amogh Desai <amoghdesai....@gmail.com>
> > > wrote:
> > >
> > > > Building upon the POC done by Manikandan, I tried my hands at an
> > > experiment
> > > > too.
> > > >
> > > > I wanted to mainly experiment with the Yunikorn admission controller,
> > > with
> > > > an aim to make
> > > >
> > > > no changes to my older DAGs.
> > > >
> > > >
> > > > Deployed a setup that looks like this:
> > > >
> > > > - Deployed Yunikorn in a kind cluster with the default
> configurations.
> > > The
> > > > default configurations launches the
> > > >
> > > > Yunikorn scheduler as well as an admission controller which watches
> > for a
> > > > `yunikorn-configs` configmap that
> > > >
> > > > can define queues, partitions, placement rules etc.
> > > >
> > > > - Deployed Airflow using helm charts in the same kind cluster while
> > > > specifying the executor as KubernetesExecutor.
> > > >
> > > >
> > > >
> > > > Wanted to test out if Yunikorn can take over the scheduling of
> Airflow
> > > > workers. Created some queues using this
> > > >
> > > > config present here:
> > > >
> > > >
> > >
> >
> https://github.com/apache/yunikorn-k8shim/blob/master/deployments/examples/namespace/queues.yaml
> > > >
> > > >
> > > > Tried running the Airflow K8s executor dag
> > > > <
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/airflow/example_dags/example_kubernetes_executor.py
> > > > >
> > > > without
> > > > any changes to the DAG.
> > > >
> > > > I was able to run the DAG successfully.
> > > >
> > > >
> > > > Results
> > > >
> > > > 1. The task pods get scheduled by Yunikorn instead of the default K8s
> > > > scheduler
> > > >
> > > >
> > > > 2. I was able to observe a single application run for the Airflow DAG
> > in
> > > > the Yunikorn UI.
> > > >
> > > >
> > > > Observations
> > > >
> > > > 1. Yunikorn treats applications at the DAG level not at the task
> level,
> > > > which is great. Due to this, we can try to leverage
> > > >
> > > > gang scheduling abilities of Yunikorn.
> > > >
> > > > 2. With the admission controller running, even the older DAGs will be
> > > able
> > > > to benefit from the Yunikorn scheduling ablities
> > > >
> > > > without the need to make changes to the DAGs. This means that the
> same
> > > DAG
> > > > will run with default scheduler (K8s default)
> > > >
> > > > as well as Yunikorn if need be!
> > > >
> > > > 3. As Mani mentioned, preemption capabilities can be explored due to
> > this
> > > > as well.
> > > >
> > > >
> > > > I am happy to work on this effort and looking forward to it.
> > > >
> > > >
> > > >
> > > > Thanks & Regards,
> > > > Amogh Desai
> > > >
> > > >
> > > > On Tue, Oct 15, 2024 at 4:26 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > > >
> > > > > Hello here,
> > > > >
> > > > > *Tl;DR; I would love to start discussion about creating (for
> Airflow
> > > 3.x
> > > > -
> > > > > it does not have to be Airflow 3.0) a new community executor based
> on
> > > > > YuniKorn*
> > > > >
> > > > > You might remember my point "replacing Celery Executor" when I
> raised
> > > the
> > > > > Airflow 3 question. I never actually "meant" to replace (and
> remove)
> > > > Celery
> > > > > Executor, but I was more in a quest to see if we have a viable
> > > > alternative.
> > > > >
> > > > > And I think we have one with Apache Yunicorn.
> > > > https://yunikorn.apache.org/
> > > > >
> > > > > While it is not a direct replacement (so I'd say it should be an
> > > > additional
> > > > > executor), I think Yunikorn can provide us with a number of
> features
> > > that
> > > > > we currently cannot give to our users and from the discussions I
> had
> > > and
> > > > > talk I saw at the Community Over Code in Denver, I believe it might
> > be
> > > > > something that might make Airflow also more capable especially in
> the
> > > > > "optimization wars" context that I wrote about in
> > > > > https://lists.apache.org/thread/1mp6jcfvx67zd3jjt9w2hlj0c5ysbh8r
> > > > >
> > > > > It seems like quite a good fit for the "Inference" use case that we
> > > want
> > > > to
> > > > > support for Airflow 3.
> > > > >
> > > > > At the Community Over Code I attended a talk (and had quite nice
> > > > follow-up
> > > > > discussion) from Apple engineers - named: "Maximizing GPU
> > Utilization:
> > > > > Apache YuniKorn Preemption" and had a very long discussion with
> > > Cloudera
> > > > > people who are using YuniKorn for years to optimize their
> workloads.
> > > > >
> > > > > The presentation is not recorded, but I will try to get slides and
> > send
> > > > it
> > > > > your way.
> > > > >
> > > > > I think we should take a close look at it  - because it seems to
> > save a
> > > > ton
> > > > > of implementation effort for the Apple team running Batch inference
> > for
> > > > > their multi-tenant internal environment - which I think is
> precisely
> > > what
> > > > > you want to do.
> > > > >
> > > > > YuniKorn (https://yunikorn.apache.org/) is an "app-aware"
> scheduler
> > > that
> > > > > has a number of queue / capacity management models, policies that
> > allow
> > > > > controlling various applications - competing for GPUs from a common
> > > pool.
> > > > >
> > > > > They mention things like:
> > > > >
> > > > > * Gang Scheduling / with gang scheduling preemption where there are
> > > > > workloads requiring minimum number of workers
> > > > > * Supports Latency sensitive workloads
> > > > > * Resource quota management - things like priorities of execution
> > > > > * YuniKorn preemption - with guaranteed capacity and preemption
> when
> > > > needed
> > > > > - which improves the utilisation
> > > > > * Preemption that minimizes preemption cost (Pod level preemption
> > > rather
> > > > > than application level preemption) - very customizable preemption
> > with
> > > > > opt-in/opt-out, queues, resource weights, fencing, supporting
> > fifo/lifo
> > > > > sorting etc.
> > > > > * Runs in Cloud and on-premise
> > > > >
> > > > > The talk described quite a few scenarios of preemption/utilization/
> > > > > guaranteed resources etc. They also outlined on what YuniKorn works
> > on
> > > > new
> > > > > features (intra-queue preemption etc.) and what future things can
> be
> > > > done.
> > > > >
> > > > >
> > > > > Coincidentally - Amogh Desai with a friend submitted a talk for
> > Airflow
> > > > > Summit:
> > > > >
> > > > > "A Step Towards Multi-Tenant Airflow Using Apache YuniKorn"
> > > > >
> > > > > Which did not make it to the Summit (other talk of Amogh did) -
> but I
> > > > think
> > > > > back then we have not realized about the potential of utilising
> > > YuniKorn
> > > > to
> > > > > optimize workflows managed by Airflow.
> > > > >
> > > > > But we seem to have people in the community who know more about
> > > YuniKorn
> > > > <>
> > > > > Airflow relation (Amogh :) ) and could probably comment and add
> some
> > > > "from
> > > > > the trenches" experience to the discussion.
> > > > >
> > > > > Here is the description of the talk that Amoghs submitted:
> > > > >
> > > > > Multi-tenant Airflow is hard and there have been novel approaches
> in
> > > the
> > > > > recent past to converge this gap. A key obstacle in multi-tenant
> > > Airflow
> > > > is
> > > > > the management of cluster resources. This is crucial to avoid one
> > > > malformed
> > > > > workload from hijacking an entire cluster. It is also vital to
> > restrict
> > > > > users and groups from monopolizing resources in a shared cluster
> > using
> > > > > their workloads.
> > > > >
> > > > > To tackle these challenges, we turn to Apache YuniKorn, a K8s
> > scheduler
> > > > > catering all kinds of workloads. We leverage YuniKorn’s
> hierarchical
> > > > queues
> > > > > in conjunction with resource quotas to establish multi-tenancy at
> > both
> > > > the
> > > > > shared namespace level and within individual namespaces where
> Airflow
> > > is
> > > > > deployed.
> > > > >
> > > > > YuniKorn also introduces Airflow to a new dimension of preemption.
> > Now,
> > > > > Airflow workers can preempt resources from lower-priority jobs,
> > > ensuring
> > > > > critical schedules in our data pipelines are met without
> compromise.
> > > > >
> > > > > Join us for a discussion on integrating Airflow with YuniKorn,
> > > unraveling
> > > > > solutions to these multi-tenancy challenges. We will also share our
> > > past
> > > > > experiences while scaling Airflow and the steps we have taken to
> > handle
> > > > > real world production challenges in equitable multi-tenant K8s
> > > clusters.
> > > > >
> > > > > I would love to hear what you think about it. I know we are deep
> into
> > > > > Airflow 3.0 implementation - but that one can be
> > discussed/implemented
> > > > > independently and maybe it's a good idea to start doing it earlier
> > than
> > > > > later if we see that it has good potential.
> > > > >
> > > > > J.
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Create community "Apache YuniKorn" executor ?

Reply via email to