Re: Airflow on ECS

Gerard Toonstra Thu, 02 Nov 2017 11:27:34 -0700

Hey,

As Bolke said, with LE and tasks consuming variable amounts of memory, you
can run into memory issues on a container. I'd reconsider running on a
containerized
environment at all, because with the LE and the scheduler, you need to set
up a huge one for that to work. You're probably better off on an EC2
instance for that.
With LE, you don't need redis at all, because redis can serve as the
back-end for the CeleryExecutor, not LocalExecutor.

We used CeleryExecutor with redis in a spike on ECS. Indeed, logging is the
biggest issue here. We used static ip's and hostnames for the containers
we started (which doesn't necessarily make them "cattle"). We closed it off
and used "splunk"  to get all logging output in a centralized location. I
didn't spend
enough time to consider all the implications there though, because the web
UI is helpful to see the log output for a specific window for example and
through
splunk you actually lose that.

There were issues with memory usage and OOM, which gets reserved by the
container, so if anything restarts or gets unstable, look at that first.

To synchronize dags across all vm's, we experimented with EFS (works like
NFS) and the idea was to let CI deploy onto that as the single write
instance.

Rgds,

Gerard

On Thu, Nov 2, 2017 at 6:55 PM, Shoumitra Srivastava <[email protected]
> wrote:

> Hi guys,
>
> So far we have had a lot of success testing out Airflow and we are now
> going for a full scale deployment. To that end, we are considering
> dockerizing airflow and deploying it on one of our ECS clusters. We are
> planning on separating out the web server and the scheduler to separate
> tasks and using local executor with an RDS postgres and redis backend. Does
> anyone else have any suggestions regarding the setup? Any design patterns
> or good practises and gotchas would be welcome.
>
> -Shoumitra
>

Re: Airflow on ECS

Reply via email to