Great work on this, I know much time you dedicated on it.

Regards,
Kaxil

On Tue, Feb 9, 2021 at 1:40 PM Ash Berlin-Taylor <[email protected]> wrote:

> Hi everyone.
>
> After a good two weeks of playing whack-a-mole with bugs, I have finally
> merged https://github.com/apache/airflow/pull/13730 which means that
> *some* builds now run on machines under our control.
>
> The biggest difference this will make is that 1) we won't be stuck in a
> queue behind other ASF projects waiting for our "slot", 2) builds should
> also be a bit faster now due to running most of the build on tmpfs
>
> I will do a more in-depth write up soon, but the rough architecture is:
>
> - A GitHub application receives events and whenever* a check-run is
> created that posts to:
> - A AWS Lambda function (via API gateway) that check if there is an idle
> runner already
> - an ASG that configures r5a.xlarge instances with tmpfs in "interesting"
> places (docker store, tmp dirs etc)
> - Some clever processes on the instance that set/clear ScaleInProtection
> so that running jobs don't get killed, and emits a custom CloudWatch metric)
> - A CloudWatch alarm to scale down the ASG when nodes are idle
> - A paid-for docker hub user on these machines to avoid hitting pull
> limits.
>
> The major downside is that due to security concerns, builds for non
> committers/PMC members still run on the public queue. However the "build
> image" step for everyone now runs on our machines, so everyone should
> benefit a bit.
>
> I do expect a bit of fallout from this, so I will be monitoring the
> Actions queue, but if there are any problems or issues let me know (here,
> or on Slack)
>
> -ash
>

Reply via email to