Hi all, I did a demo of airflow for an organisation where they currently use azkaban and they liked the project and demonstrated interest in using it. The installation however was considered a bit more work than they wanted: mysql db, celery, rabbitMQ and scheduler that all had to be puppetized and maintained.
Since they also use google cloud, I did a short investigation into the effort required to run this 'natively' on the cloud at a button press, where you have a deployment file that takes care of the entire installation and backup requirements. I talked about this with some members on this list and they suggested I should subscribe and describe my ideas. In this setup, mysql becomes cloud SQL (same thing), the scheduler is run on a compute instance and GC pubsub is used together with a custom GCloudExecutor to perform the work, similar to how MesosExecutor is a contribution in the contrib folder. The reason to make this a separate executor rather than adding pubsub as a MQ for Celery is that the executor can become smarter in the future and spin up instances as required if queues are filling up, or if there are no suitable workers available for particular jobs that require a lot of CPU, the GPU or memory. I made a full manual of how this would be installed here. Note it's a spike (6 hour tryout), so it's not supposed to be a final thing yet: https://docs.google.com/document/d/1AarL24kaIZ4-PWVovEgj1Q7tlb14QUABOZEQ0c-U208/edit?usp=sharing The way I understand things, it's possible to come up with a managed deployment that can eventually be run from google cloud launcher at a minimum of effort and I see this as an end goal of this effort: https://console.cloud.google.com/launcher Looking forward to your thoughts on how I can contribute and what I should do next. Rgds, Gerard
