+1 I like the idea. Looking forward to seeing the difference. Thanks & Regards, Amogh Desai
On Fri, Apr 5, 2024 at 3:54 AM Ferruzzi, Dennis <ferru...@amazon.com.invalid> wrote: > Interested in seeing the difference, +1 > > > - ferruzzi > > > ________________________________ > From: Oliveira, Niko <oniko...@amazon.com.INVALID> > Sent: Thursday, April 4, 2024 2:00 PM > To: dev@airflow.apache.org > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling > self-hosted runners for commiter PRs > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que > le contenu ne présente aucun risque. > > > > +1I'd love to see this as well. > > In the past, stability and long queue times of PR builds have been very > frustrating. I'm not 100% sure this is due to using self hosted runners, > since 35 queue depth (to my mind) should be plenty. But something about > that setup has never seemed quite right to me with queuing. Switching to > public runners for a while to experiment would be great to see if it > improves. > > ________________________________ > From: Pankaj Koti <pankaj.k...@astronomer.io.INVALID> > Sent: Thursday, April 4, 2024 12:41:02 PM > To: dev@airflow.apache.org > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling > self-hosted runners for commiter PRs > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que > le contenu ne présente aucun risque. > > > > +1 from me to this idea. > > Sounds very reasonable to me. > At times, my experience has been better with public runners instead of > self-hosted runners :) > > And like already mentioned in the discussion, I think having the ability of > a applying the label "use-self-hosted-runners" to be used for critical > times would be nice to have too. > > > On Fri, 5 Apr 2024, 00:50 Jarek Potiuk, <ja...@potiuk.com> wrote: > > > Hello everyone, > > > > TL;DR With some recent changes in GitHub Actions and the fact that ASF > has > > a lot of runners available donated for all the builds, I think we could > > experiment with disabling "self-hosted" runners for committer builds. > > > > The self-hosted runners of ours have been extremely helpful (and we > should > > again thank Amazon and Astronomer for donating credits / money for > those) - > > when the Github Public runners have been far less powerful - and we had > > less number of those available for ASF projects. This saved us a LOT of > > troubles where there was a contention between ASF projects. > > > > But as of recently both limitations have been largely removed: > > > > * ASF has 900 public runners donated by GitHub to all projects > > * Those public runners have (as of January) for open-source projects now > > have 4 CPUS and 16GB of memory - > > > > > https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/ > > > > > > While they are not as powerful as our self-hosted runners, the > parallelism > > we utilise for those brings those builds in not-that bad shape compared > to > > self-hosted runners. Typical differences between the public and > self-hosted > > runners now for the complete set of tests are ~ 20m for public runners > and > > ~14 m for self-hosted ones. > > > > But this is not the only factor - I think committers experience the "Job > > failed" for self-hosted runners generally much more often than > > non-committers (stability of our solution is not best, also we are using > > cheaper spot instances). Plus - we limit the total number of self-hosted > > runners (35) - so if several committers submit a few PRs and we have > canary > > build running, the jobs will wait until runners are available. > > > > And of course it costs the credits/money of sponsors which we could use > for > > other things. > > > > I have - as of recently - access to Github Actions metrics - and while > ASF > > is keeping an eye and stared limiting the number of parallel jobs > workflows > > in projects are run, it looks like even if all committer runs are added > to > > the public runners, we will still cause far lower usage that the limits > are > > and far lower than some other projects (which I will not name here). I > > have access to the metrics so I can monitor our usage and react. > > > > I think possibly - if we switch committers to "public" runners by default > > -the experience will not be much worse for them (and sometimes even > better > > - because of stability/limited queue). > > > > I was planning this carefully - I made a number of refactors/changes to > our > > workflows recently that makes it way easier to manipulate the > configuration > > and get various conditions applied to various jobs - so > > changing/experimenting with those settings should be - well - a breeze > :). > > Few recent changes had proven that this change and workflow refactor were > > definitely worth the effort, I feel like I finally got a control over it > > where previously it was a bit like herding a pack of cats (which I > > brought to live by myself, but that's another story). > > > > I would like to propose to run an experiment and see how it works if we > > switch committer PRs back to the public runners - leaving the self-hosted > > runners only for canary builds (which makes perfect sense because those > > builds run a full set of tests and we need as much speed and power there > as > > we can. > > > > This is pretty safe, We should be able to switch back very easily if we > see > > problems. I will also monitor it and see if our usage is within the > limits > > of the ASF. I can also add the feature that committers should be able to > > use self-hosted runners by applying the "use self-hosted runners" label > to > > a PR. > > > > Running it for 2-3 weeks should be enough to gather experience from > > committers - whether things will seem better or worse for them - or maybe > > they won't really notice a big difference. > > > > Later we could consider some next steps - disabling the self-hosted > runners > > for canary builds if we see that our usage is low and build are fast > > enough, eventually possibly removing current self-hosted runners and > > switching to a better k8s based infrastructure (which we are close to do > > but it makes it a bit difficult while current self-hosted solution is so > > critical to keep it running (like rebuilding the plane while it is > flying). > > I'd love to do it gradually in the "change slowly and observe" mode - > > especially now that I have access to "proper" metrics. > > > > WDYT? > > > > J. > > >