Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

Amogh Desai Fri, 05 Apr 2024 03:09:38 -0700

+1 I like the idea.
Looking forward to seeing the difference.

Thanks & Regards,
Amogh Desai



On Fri, Apr 5, 2024 at 3:54 AM Ferruzzi, Dennis <[email protected]>
wrote:

> Interested in seeing the difference, +1
>
>
>  - ferruzzi
>
>
> ________________________________
> From: Oliveira, Niko <[email protected]>
> Sent: Thursday, April 4, 2024 2:00 PM
> To: [email protected]
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> self-hosted runners for commiter PRs
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1I'd love to see this as well.
>
> In the past, stability and long queue times of PR builds have been very
> frustrating. I'm not 100% sure this is due to using self hosted runners,
> since 35 queue depth (to my mind) should be plenty. But something about
> that setup has never seemed quite right to me with queuing. Switching to
> public runners for a while to experiment would be great to see if it
> improves.
>
> ________________________________
> From: Pankaj Koti <[email protected]>
> Sent: Thursday, April 4, 2024 12:41:02 PM
> To: [email protected]
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> self-hosted runners for commiter PRs
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1 from me to this idea.
>
> Sounds very reasonable to me.
> At times, my experience has been better with public runners instead of
> self-hosted runners :)
>
> And like already mentioned in the discussion, I think having the ability of
> a applying the label "use-self-hosted-runners" to be used for critical
> times would be nice to have too.
>
>
> On Fri, 5 Apr 2024, 00:50 Jarek Potiuk, <[email protected]> wrote:
>
> > Hello everyone,
> >
> > TL;DR With some recent changes in GitHub Actions and the fact that ASF
> has
> > a lot of runners available donated for all the builds, I think we could
> > experiment with disabling "self-hosted" runners for committer builds.
> >
> > The self-hosted runners of ours have been extremely helpful (and we
> should
> > again thank Amazon and Astronomer for donating credits / money for
> those) -
> > when the Github Public runners have been far less powerful - and we had
> > less number of those available for ASF projects. This saved us a LOT of
> > troubles where there was a contention between ASF projects.
> >
> > But as of recently both limitations have been largely removed:
> >
> > * ASF has 900 public runners donated by GitHub to all projects
> > * Those public runners have (as of January) for open-source projects now
> > have 4 CPUS and 16GB of memory -
> >
> >
> https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/
> >
> >
> > While they are not as powerful as our self-hosted runners, the
> parallelism
> > we utilise for those brings those builds in not-that bad shape compared
> to
> > self-hosted runners. Typical differences between the public and
> self-hosted
> > runners now for the complete set of tests are ~ 20m for public runners
> and
> > ~14 m for self-hosted ones.
> >
> > But this is not the only factor - I think committers experience the "Job
> > failed" for self-hosted runners generally much more often than
> > non-committers (stability of our solution is not best, also we are using
> > cheaper spot instances). Plus - we limit the total number of self-hosted
> > runners (35) - so if several committers submit a few PRs and we have
> canary
> > build running, the jobs will wait until runners are available.
> >
> > And of course it costs the credits/money of sponsors which we could use
> for
> > other things.
> >
> > I have - as of recently - access to Github Actions metrics - and while
> ASF
> > is keeping an eye and stared limiting the number of parallel jobs
> workflows
> > in projects are run, it looks like even if all committer runs are added
> to
> > the public runners, we will still cause far lower usage that the limits
> are
> > and far lower than some other projects (which I will not name here).  I
> > have access to the metrics so I can monitor our usage and react.
> >
> > I think possibly - if we switch committers to "public" runners by default
> > -the experience will not be much worse for them (and sometimes even
> better
> > - because of stability/limited queue).
> >
> > I was planning this carefully - I made a number of refactors/changes to
> our
> > workflows recently that makes it way easier to manipulate the
> configuration
> > and get various conditions applied to various jobs - so
> > changing/experimenting with those settings should be - well - a breeze
> :).
> > Few recent changes had proven that this change and workflow refactor were
> > definitely worth the effort, I feel like I finally got a control over it
> > where previously it was a bit like herding a pack of cats (which I
> > brought to live by myself, but that's another story).
> >
> > I would like to propose to run an experiment and see how it works if we
> > switch committer PRs back to the public runners - leaving the self-hosted
> > runners only for canary builds (which makes perfect sense because those
> > builds run a full set of tests and we need as much speed and power there
> as
> > we can.
> >
> > This is pretty safe, We should be able to switch back very easily if we
> see
> > problems. I will also monitor it and see if our usage is within the
> limits
> > of the ASF. I can also add the feature that committers should be able to
> > use self-hosted runners by applying the "use self-hosted runners" label
> to
> > a PR.
> >
> > Running it for 2-3 weeks should be enough to gather experience from
> > committers - whether things will seem better or worse for them - or maybe
> > they won't really notice a big difference.
> >
> > Later we could consider some next steps - disabling the self-hosted
> runners
> > for canary builds if we see that our usage is low and build are fast
> > enough, eventually possibly removing current self-hosted runners and
> > switching to a better k8s based infrastructure (which we are close to do
> > but it makes it a bit difficult while current self-hosted solution is so
> > critical to keep it running (like rebuilding the plane while it is
> flying).
> > I'd love to do it gradually in the "change slowly and observe" mode -
> > especially now that I have access to "proper" metrics.
> >
> > WDYT?
> >
> > J.
> >
>

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

Reply via email to