> On 4 Aug 2018, at 21:25, Bolke de Bruin <[email protected]> wrote: > > We can just execute “python” just fine. Because it will run in a separate > interpreter no issues will come from sys.modules as that is not inherited. > Will still parse DAGs in a separate process then. Forking (@ash) probably > does not work as that does share sys.modules.
Some sharing of modules was my idea - if we are careful about what modules we load, and we only load the airflow core pre fork, and don't parse any DAG pre-fork, then forking sharing currently loaded modules is a good thing for speed. Think of it like the preload_app option to a gunicorn worker, where the master loads the app and then forks. > [snip] > > I’m writing AIP-2 > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-2+Simplify+process+launching > to work this out. Sounds good. I'm not proposing we try my forking idea yet, and your proposal is a definite improvement from where we are now. > > B. > > Verstuurd vanaf mijn iPad > >> Op 4 aug. 2018 om 19:40 heeft Ash Berlin-Taylor >> <[email protected]> het volgende geschreven: >> >> Comments inline. >> >>> On 4 Aug 2018, at 18:28, Maxime Beauchemin <[email protected]> >>> wrote: >>> >>> Let me confirm I'm understanding this right, we're talking specifically >>> about the CeleryExecutor not starting and `airflow run` (not --raw) >>> command, and fire up a LocalTaskJob instead? Then we'd still have the >>> worker fire up the `airflow run --raw` command? >>> >>> Seems reasonable. One thing to keep in mind is the fact that shelling out >>> guarantees no `sys.module` caching, which is a real issue for slowly >>> changing DAG definitions. That's the reason why we'd have to reboot the >>> scheduler periodically before it used sub-processes to evaluate DAGs. Any >>> code that needs to evaluate a DAG should probably be done in a subprocess. >> >>> >>> Shelling out also allows for doing things like unix impersonation and >>> applying CGROUPS. This currently happens between `airflow run` and `airflow >>> run --raw`. The parent process also does heartbeat and listen for external >>> kill signal (kill pills). >>> >>> I think what we want is smarter executors and only one level of bash >>> command: the `airflow run --raw`, and ideally the system that fires this up >>> is not Airflow itself, and cannot be DAG-aware (or it will need to get >>> restarted to flush the cache). >> >> Rather than shelling out to `airflow run` could we instead fork and run the >> CLI code directly? This involves parsing the config twice, loading all of >> the airflow and SQLAlchemy deps twice etc. This I think would account for a >> not-insignificant speed difference for the unit tests. In the case of >> impersonation we'd probably have no option but to exec `airflow`, but >> most(?) people don't use that? >> >> Avoiding the extra parsing pentalty and process when we don't need it might >> be worth it for test speed up alone. And we've already got impersonation >> covered in the tests so we'll know that it still works. >> >>> >>> To me that really brings up the whole question of what should be handled by >>> the Executor, and what belongs in core Airflow. The Executor needs to do >>> more, and Airflow core less. >> >> I agree with the sentiment that Core should do less and Executors more -- >> many parts of the core are reimplementing what Celery itself could do. >> >> >>> >>> When you think about how this should all work on Kubernetes, it looks >>> something like this: >>> * the scheduler, through KubeExecutor, calls the k8s API, tells it to fire >>> up and Airflow task >>> * container boots up and starts an `airflow run --raw` command >>> * k8s handles heartbeats, monitors tasks, knows how to kill a running task >>> * the scheduler process (call it supervisor), talks with k8s through >>> KubeExecutor >>> and handles zombie cleanup and sending kill pills >>> >>> Now because Celery doesn't offer as many guarantees it gets a bit more >>> tricky. Is there even a way to send a kill pill through Celery? Are there >>> other ways than using a parent process to accomplish this? >> >> It does >> http://docs.celeryproject.org/en/latest/userguide/workers.html#revoke-revoking-tasks >> (at least it does now) >> >>> >>> At a higher level, it seems like we need to move more logic from core >>> Airflow into the executors. For instance, the heartbeat construct should >>> probably be 100% handled by the executor, and not an assumption in the core >>> code base. >>> >>> I think I drifted a bit, hopefully that's still helpful. >>> >>> Max
