Hi all,

I did a demo of airflow for an organisation where they currently use
azkaban and they liked the project and demonstrated interest in using it.
The installation however was considered a bit more work than they wanted:
mysql db, celery, rabbitMQ and scheduler that all had to be puppetized and
maintained.

Since they also use google cloud, I did a short investigation into the
effort required to run this 'natively' on the cloud at a button press,
where you have a deployment file that takes care of the entire installation
and backup requirements. I talked about this with some members on this list
and they suggested I should subscribe and describe my ideas.

In this setup, mysql becomes cloud SQL (same thing), the scheduler is run
on a compute instance and GC pubsub is used together with a custom
GCloudExecutor to perform the work, similar to how MesosExecutor is a
contribution in the contrib folder. The reason to make this a separate
executor rather than adding pubsub as a MQ for Celery is that the executor
can become smarter in the future and spin up instances as required if
queues are filling up, or if there are no suitable workers available for
particular jobs that require a lot of CPU, the GPU or memory.


I made a full manual of how this would be installed here. Note it's a spike
(6 hour tryout), so it's not supposed to be a final thing yet:

https://docs.google.com/document/d/1AarL24kaIZ4-PWVovEgj1Q7tlb14QUABOZEQ0c-U208/edit?usp=sharing


The way I understand things, it's possible to come up with a managed
deployment that can eventually be run from google cloud launcher at a
minimum of effort and I see this as an end goal of this effort:

https://console.cloud.google.com/launcher


Looking forward to your thoughts on how I can contribute and what I should
do next.

Rgds,

Gerard

Reply via email to