Re: [DISCUSS] Docker runtime isolation for airflow tasks

Ash Berlin-Taylor Fri, 17 Dec 2021 02:31:48 -0800

Hi Ping,

(The dev list doesn't allow attachments, so we can't see any of theimages you've posted, so some of my questions might have been addressedby those images.)

It seems that a lot of the goals here are overlapping with the AIP-1and proposed separation of dag processor from scheduler andmulti-tenancy work in general.Your description of how the scheduler and DAG parsing process operateis based on 1.10 mode of operation, but that has changed in 2.0 -- thescheduler _only_ operates on the serialized representation and doesn'tneed the result of the dag parsing process. Breaking this tightcoupling was one of the major speed ups I achieved.

It's not clear from your email the exact details yet, but my initialcomments:

1. Runtime isolation of task execution is already possible by using theKubernetesExecutor

2. Running short-lived process (such as what I think you are proposingfor dag parsing) in a Kube cluster isn't really practical as the spinup time of pods is highly variable and can be to the order of minutes

3. Not everyone has docker available or is comfortable running it -- we100% need to support running without Docker or containers still.

4. Many of our users are Data Scientists or Engineers, and so aren'thappy with building containers.

On Thu, Dec 16 2021 at 15:52:02 -0800, Ping Zhang <[email protected]>wrote:

Hi Airflow Community,
This is Ping Zhang from the Airbnb Airflow team. We would like toopen source our internal feature: docker runtime isolation forairflow tasks. It has been in our production for close to 1 year andit is very stable.
I will create an AIP after the discussion.

Thanks,

Ping

Motivation
Airflow worker host is a shared resource among all tasks running onit. Thus, it requires hosts to provision dependencies for all tasks,including system and python application level dependencies. It leadsto a very fat runtime, thus long host provision time and lowelasticity in the worker resource. This makes it challenging toprepare for unexpected burst load, including a large backfill or arerun of large DAGs.
The lack of runtime isolation makes it challenging and risky to dooperations, including adding/upgrading system and pythondependencies, and it is almost impossible to remove any dependencies.It also incurs lots of additional operating costs for the team asusers do not have permission to add/upgrade python dependencies,which requires us to coordinate with them. When there are packageversion conflicts, it prevents installing them directly on the host.Users have to use PythonVirtualenvOperator, which slows down theirdevelopment cycle.
What change do you propose to make?
To solve those problems, we propose introducing runtime isolation forAirflow tasks. It leverages docker as the tasks runtime environment.There are several benefits:
Provide runtime isolation on task level

Customize runtime to parse dag files
Lean runtime on airflow host, which enables high worker resourceelasticity
Immutable and portable task execution untime
Process isolation ensures that all subprocesses of a task are cleanedup after docker exits (we have seen some orphaned hive, sparksubprocesses after the airflow run process exits)
Changes
Airflow Worker
In the new design, the `airflow run local` and `airflow run raw`processes are running inside a docker container, which is launched byan airflow worker. In this way, the airflow worker runtime only needsminimum requirements to run airflow core and docker.
Airflow Scheduler
Instead of processing the DAG file directly, the DagFileProcessorprocess
launches a docker container required by that DAG file to process itand persists the serializable DAGs (SimpleDags) to a file so that theresult can be read outside the docker container
reads the file persisted from the docker container, deserializes itand puts the result into the multiprocess queue
This ensures the DAG parsing runtime is exactly the same as DAGexecution runtime.
This requires a DAG definition file to tell the DAG file processingloop to use which docker image to process it. We can easily achievethis by having a metadata file along with the DAG definition file todefine the docker runtime. To ease the burden of users, a defaultdocker image is provided when a DAG definition file does not requirecustomized runtime.
As a Whole





Best wishes

Ping Zhang

Re: [DISCUSS] Docker runtime isolation for airflow tasks

Reply via email to