Let me confirm I'm understanding this right, we're talking specifically about the CeleryExecutor not starting and `airflow run` (not --raw) command, and fire up a LocalTaskJob instead? Then we'd still have the worker fire up the `airflow run --raw` command?
Seems reasonable. One thing to keep in mind is the fact that shelling out guarantees no `sys.module` caching, which is a real issue for slowly changing DAG definitions. That's the reason why we'd have to reboot the scheduler periodically before it used sub-processes to evaluate DAGs. Any code that needs to evaluate a DAG should probably be done in a subprocess. Shelling out also allows for doing things like unix impersonation and applying CGROUPS. This currently happens between `airflow run` and `airflow run --raw`. The parent process also does heartbeat and listen for external kill signal (kill pills). I think what we want is smarter executors and only one level of bash command: the `airflow run --raw`, and ideally the system that fires this up is not Airflow itself, and cannot be DAG-aware (or it will need to get restarted to flush the cache). To me that really brings up the whole question of what should be handled by the Executor, and what belongs in core Airflow. The Executor needs to do more, and Airflow core less. When you think about how this should all work on Kubernetes, it looks something like this: * the scheduler, through KubeExecutor, calls the k8s API, tells it to fire up and Airflow task * container boots up and starts an `airflow run --raw` command * k8s handles heartbeats, monitors tasks, knows how to kill a running task * the scheduler process (call it supervisor), talks with k8s through KubeExecutor and handles zombie cleanup and sending kill pills Now because Celery doesn't offer as many guarantees it gets a bit more tricky. Is there even a way to send a kill pill through Celery? Are there other ways than using a parent process to accomplish this? At a higher level, it seems like we need to move more logic from core Airflow into the executors. For instance, the heartbeat construct should probably be 100% handled by the executor, and not an assumption in the core code base. I think I drifted a bit, hopefully that's still helpful. Max On Sat, Aug 4, 2018 at 6:42 AM Dan Davydov <[email protected]> wrote: > Alex (cc'd) brought this up to me about this a while ago too, and I agreed > with him. It is definitely something we should do, I remember there were > some things that were a bit tricky about removing the intermediate process > and would be a bit of work to fix (something about the tasks needing to > heartbeat the parent process maybe?). > > TLDR: No blockers from me, just might be a bit of work to implement. > > > On Sat, Aug 4, 2018 at 9:15 AM Bolke de Bruin <[email protected]> wrote: > > > Hi Max, Dan et al, > > > > Currently, when a scheduled task runs this happens in three steps: > > > > 1. Worker > > 2. LocalTaskJob > > 3. Raw task instance > > > > It uses (by default) 5 (!) different processes: > > > > 1. Worker > > 2. Bash + Airflow > > 3. Bash + Airflow > > > > I think we can merge worker and LocalTaskJob as the latter seems exist > > only to track a particular task. This can be done within the worker > without > > side effects. Next to thatI think we can limit the amount of (airflow) > > processes to 2 if we remove the bash dependency. I don’t see any reason > to > > depend on bash. > > > > Can you guys shed some light on what the thoughts were around those > > choices? Am I missing anything on why they should exist? > > > > Cheers > > Bolke > > > > Verstuurd vanaf mijn iPad >
