Re: Add /ready-to-test bot to airflow CI

Ash Berlin-Taylor Mon, 26 Oct 2020 02:54:01 -0700

Is "ready to merge" also going to automatically merge if tests are green?


I think it shouldn't unless we also remove that label on a new push to the 
branch - consider this:
PR is reviewed and approved and a simple change, committer reviews it and gives 
it an approval; tests currently running
Label is applied

While tests are running PR author pushes malicious code

Tests for this new push pass and it's automatically merged.

Because of this I think "ready to merge" is actually the wrong name as it 
conveys extra meaning that we want to avoid. (And I also don't want to remove 
approvals when pushing, there are many many cases where it's just a small 
change requested, and we give approval with "make this change; I'm 
pre-emptively approving it")
-ash
On Oct 24 2020, at 8:49 pm, Daniel Imberman <daniel.imber...@gmail.com> wrote:
> I think ready to merge makes more sense
>
> via Newton Mail 
> (https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.51&pv=10.15.6&source=email_footer_2)
> On Sat, Oct 24, 2020 at 12:13 PM, Jarek Potiuk <jarek.pot...@polidea.com> 
> wrote:
> > Some short update - seems like we can get 50% 60% saving in job usage by 
> > the "unapproved PRs". We are progressing with implementation :D.
> >
> > On Sat, Oct 24, 2020 at 10:55 AM Jarek Potiuk <jarek.pot...@polidea.com 
> > (mailto:jarek.pot...@polidea.com)> wrote:
> > > FYI. We found out with Tobiasz, that it will be a bit better and if we 
> > > add "Approved" label in the PR that the workflow will automatically set 
> > > when the issue gets approved.
> > >
> > > This way we have state of the PR approval (we know when it changes) and 
> > > know that we should re-run last "small matrix" successful run when the 
> > > label changes to "Approved".
> > >
> > > This will also be an additional indication to committers in case of 
> > > queues and delays we se. It might be that the "small" matrix run is 
> > > already successful, the PR gets approved but the "full matrix build" is 
> > > delayed due to queuing. Such PR will have green "merge" button and might 
> > > get merged by mistake - but it will not have the "Approved" label yet. 
> > > Setting the label and re-running the build will happen at the same time.
> > >
> > > But I start thinking this label should be named differently - how about 
> > > "Ready to merge" maybe? Or maybe other ideas?
> > >
> > > J.
> > >
> > >
> > >
> > >
> > > On Sat, Oct 24, 2020 at 1:10 AM Daniel Imberman 
> > > <daniel.imber...@gmail.com (mailto:daniel.imber...@gmail.com)> wrote:
> > > > +1000!
> > > >
> > > > via Newton Mail 
> > > > (https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.51&pv=10.15.6&source=email_footer_2)
> > > > On Fri, Oct 23, 2020 at 8:22 AM, Jarek Potiuk <jarek.pot...@polidea.com 
> > > > (mailto:jarek.pot...@polidea.com)> wrote:
> > > > > Thanks Tobiasz :). fantastic.
> > > > >
> > > > > I prepared a very short and simple design doc 
> > > > > https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#
> > > > >  where we can collaborate.
> > > > >
> > > > > I also added you as collaborator to 
> > > > > https://github.com/potiuk/get-workflow-origin that we already use, 
> > > > > and I think you can update the "get workflow origin" plugin to 
> > > > > include status of the PR in the output of the action (ore maybe we 
> > > > > find out that we already have what we need in GitHub context).
> > > > >
> > > > > I will take a look at finding out how/if we can trigger the "full 
> > > > > build" automatically when approval status changes from "Not approved" 
> > > > > to "Approved".
> > > > >
> > > > > J.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 22, 2020 at 7:20 PM Tobiasz Kędzierski 
> > > > > <tobiasz.kedzier...@polidea.com 
> > > > > (mailto:tobiasz.kedzier...@polidea.com)> wrote:
> > > > > > Hi Jarek.
> > > > > >
> > > > > > sounds good to me. I am happy to help you as much as I can with it.
> > > > > > BR
> > > > > > Tobiasz
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 22, 2020 at 9:06 AM Jarek Potiuk 
> > > > > > <jarek.pot...@polidea.com (mailto:jarek.pot...@polidea.com)> wrote:
> > > > > > > TLDR; I thought about it a bit and I have a proposal on how to 
> > > > > > > solve it even better - one that can be implemented over the 
> > > > > > > weekend (I volunteer :) ) and that can be very easily shared and 
> > > > > > > adopted by the other ASF projects so that we all collectively 
> > > > > > > decrease the strain on Github Actions.
> > > > > > >
> > > > > > > This is in parallel to our efforts on having self-hosted workers 
> > > > > > > of course, but I think it will be needed anyway. Let me put it in 
> > > > > > > a bit of context
> > > > > > >
> > > > > > > Problem statement:
> > > > > > >
> > > > > > > * the root cause of the problem is that we are competing with 
> > > > > > > many other projects of ASF for the 180 jobs. I have started the 
> > > > > > > discussion in bui...@apache.org (mailto:bui...@apache.org) about 
> > > > > > > this: 
> > > > > > > https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E
> > > > > > >  and it's clear all ASF projects using GA have the same problem 
> > > > > > > and compete against each other for the jobs.
> > > > > > > * if we decrease the strain on our side, this is not solving the 
> > > > > > > problem long term. We keep on doing it already, and we already 
> > > > > > > decrease a lot of strain, but other projects from ASF increase 
> > > > > > > their strain in the meantime (Apache Beam, Skywalking, and few 
> > > > > > > other projects are becoming heavy GitHub Actions users).
> > > > > > > * in all the projects that I looked at, we have the same root 
> > > > > > > cause. Matrix strategy of builds causes enormous strain on Github 
> > > > > > > Actions if the whole matrix is run for all PRs. We are going to 
> > > > > > > make it works sustainably only if we come up with an easy 
> > > > > > > solution, that can be applied to all those projects.
> > > > > > > * I think the comment-based PR triggering process is complex and 
> > > > > > > cumbersome to follow. It puts a LOT more effort on the committers 
> > > > > > > because they not only have to review and comment on the PRs but 
> > > > > > > also make decisions that those PRs are ready for "full build". 
> > > > > > > This is a lot of unnecessary effort and complicated process that 
> > > > > > > many of the ASF projects will not like to adopt
> > > > > > >
> > > > > > > Proposed Solution:
> > > > > > >
> > > > > > > Add an easy way to limit the matrix strategy to one "default" 
> > > > > > > combo for PRs THAT HAVE NOT BEEN APPROVED BY COMMITTERS YET
> > > > > > >
> > > > > > > This can be easily done with Github Actions workflows - no need 
> > > > > > > to write a bot for this.
> > > > > > >
> > > > > > > Some details:
> > > > > > >
> > > > > > > * Custom GitHub Action (generic) that checks if the PRs are 
> > > > > > > approved by Committers (and no "disapprovals"). The action would 
> > > > > > > produce an output -> "Approved", "Not approved". The output could 
> > > > > > > be used to determine the matrix strategy scope (in our case we 
> > > > > > > already have support for dynamic matrix strategy that I added a 
> > > > > > > few weeks ago - so it's just a matter of wiring the output in).
> > > > > > > * Very small workflow with the same GithubAction run on 
> > > > > > > "pull_request_target" event. That workflow would effective 
> > > > > > > "observe" the PR, and when the status changes from "not approved" 
> > > > > > > to "approved", it triggers a PR build (with the "full matrix 
> > > > > > > strategy" this time because the PR will be already approved). 
> > > > > > > This seems to be entirely possible. This "pull_request_target" 
> > > > > > > workflow - similarly to "workflow_run" runs with a "write" access 
> > > > > > > token and uses a "main branch" workflow version and it could 
> > > > > > > easily trigger a rerun of the last PR build in such case.
> > > > > > >
> > > > > > > Benefits:
> > > > > > >
> > > > > > > - I think I could write such an action over the coming weekend 
> > > > > > > (happy to collaborate with anyone on that). I will first search 
> > > > > > > if someone has done something similar of course because maybe it 
> > > > > > > can be done faster this way, but I am quite confident after 
> > > > > > > writing my https://github.com/potiuk/cancel-workflow-runs which 
> > > > > > > we already use to limit the strain that it is doable in a day/two
> > > > > > > - no need to change the process we have - we continue working as 
> > > > > > > we did and simply "approved" PRs will be the full matrix strategy 
> > > > > > > ones but the "not-approved-ones" will run a limited version of 
> > > > > > > the checks.
> > > > > > > - no way to accidentally submit a breaking PR - when the 
> > > > > > > committer approves the PR that has not been approved before, the 
> > > > > > > PR build will be re-run with the "full matrix strategy" and not 
> > > > > > > mergeable until it finishes
> > > > > > > - last-but-not-least: we can propose (and help) other ASF 
> > > > > > > projects to use the action in their own GitHub Actions. It will 
> > > > > > > not be changing anyone's process - which makes it super-easy to 
> > > > > > > adopt and I can even turn it into a "recommended solution" by 
> > > > > > > Apache Infra - similarly as Airflow's CI architecture is a 
> > > > > > > recommended solution already for the integration of GA with 
> > > > > > > DockerHub 
> > > > > > > https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub
> > > > > > >
> > > > > > > WDYT?
> > > > > > >
> > > > > > > J
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Oct 2, 2020 at 7:59 PM Jarek Potiuk 
> > > > > > > <jarek.pot...@polidea.com (mailto:jarek.pot...@polidea.com)> 
> > > > > > > wrote:
> > > > > > > > I think it's a good idea, but I'd augment it a bit. A better 
> > > > > > > > option will be to run all test types but
> > > > > > > > for only one chosen combination of "Python + DB type + DB 
> > > > > > > > version".
> > > > > > > >
> > > > > > > > I often don't even look at the PR until the tests pass and this 
> > > > > > > > would be much better this way.
> > > > > > > > And often people have slower/small machines so they submit the 
> > > > > > > > PR to see if they have not
> > > > > > > > broken any other tests. This is much, much easier than doing it 
> > > > > > > > locally - because then in one
> > > > > > > > "fire&forget" you can run static, doc, unit tests, integration, 
> > > > > > > > and Kubernetes ones,. And it's a valid
> > > > > > > > thing.
> > > > > > > >
> > > > > > > > Also, we have to make sure that such PR does not become "Green" 
> > > > > > > > before all the tests are run.
> > > > > > > > This might be rather problematic as Github does not yet have a 
> > > > > > > > " manual" Approval step in
> > > > > > > > Github Actions (it's coming in Q4: 
> > > > > > > > https://github.com/github/roadmap/issues/99).
> > > > > > > >
> > > > > > > > We have many tests and already we hit a bug a few times, where 
> > > > > > > > not all tests have yet started
> > > > > > > > and we've merged such PR. I can imagine it will happen more and 
> > > > > > > > more often if all PRs will
> > > > > > > > only run a subset of tests. It will be very easy to make that 
> > > > > > > > mistake because even if we run a subset of
> > > > > > > > those tests, we have so many jobs that you cannot see them all 
> > > > > > > > in the GitHub UI.
> > > > > > > >
> > > > > > > > So we will have to have a check that fails the PR but marks it 
> > > > > > > > somehow as "Ready for review" for example adding
> > > > > > > > a label "Ready for review" when the subset of tests succeeds.
> > > > > > > >
> > > > > > > > Also, this might not be needed (or less important) if we 
> > > > > > > > implement: https://github.com/apache/airflow/issues/10507 
> > > > > > > > "Selective Tests"
> > > > > > > > for which I have an open PR. They will give much bigger 
> > > > > > > > improvements - because, in the vast majority of cases, the 
> > > > > > > > tests will take
> > > > > > > > very little time - giving feedback
> > > > > > > > about relevant tests in a few minutes rather than half-an-hour. 
> > > > > > > > We can also combine those two.
> > > > > > > >
> > > > > > > > It seems that I managed to finish some of the stuff that I 
> > > > > > > > thought will take more time, so I might come back to it next 
> > > > > > > > week
> > > > > > > > if it goes as well as I planned.
> > > > > > > >
> > > > > > > > J.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Oct 2, 2020 at 3:57 AM Daniel Imberman 
> > > > > > > > <daniel.imber...@gmail.com (mailto:daniel.imber...@gmail.com)> 
> > > > > > > > wrote:
> > > > > > > > > I’m not too worried about that. I think people would learn 
> > > > > > > > > pretty quickly. It hasn’t been an issue for the kubernetes 
> > > > > > > > > community so I can’t imagine it being an issue for us. 
> > > > > > > > > End-of-day, we only have a limited amount of compute power 
> > > > > > > > > and this will increase the speed we merge the PR’s that have 
> > > > > > > > > passed basic code quality checks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Oct 1, 2020 at 2:19 PM, Tomasz Urbaszek 
> > > > > > > > > <turbas...@apache.org (mailto:turbas...@apache.org)> wrote:
> > > > > > > > > > I think I can agree. Especially with flaky tests, some 
> > > > > > > > > > contributors may be confused that some of the tests don't 
> > > > > > > > > > work on CI but work locally...
> > > > > > > > > >
> > > > > > > > > > Checking the code quality is good first step. Once there's 
> > > > > > > > > > a review we can start tests on CI.
> > > > > > > > > >
> > > > > > > > > > On the other hand, I can see people asking for starting the 
> > > > > > > > > > tests or being even more confused why some PRs have more CI 
> > > > > > > > > > builds than others...
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Tomek
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Oct 1, 2020 at 10:29 PM Daniel Imberman 
> > > > > > > > > > <daniel.imber...@gmail.com 
> > > > > > > > > > (mailto:daniel.imber...@gmail.com)> wrote:
> > > > > > > > > > > Hello all,
> > > > > > > > > > >
> > > > > > > > > > > With the recent uptick in airflow contribution and pull 
> > > > > > > > > > > requests, I have a proposal that I hope will ensure that 
> > > > > > > > > > > we do not find ourselves in a CI backlog hell. I noticed 
> > > > > > > > > > > that on the Kubernetes project, pull requests do not run 
> > > > > > > > > > > integration test until a committer submits a "ready to 
> > > > > > > > > > > test" command to the CI bot. This step can prevent draft 
> > > > > > > > > > > PRs or un-reviewed PRs from taking github CI resources. 
> > > > > > > > > > > It is worth noting that with breeze's docker based 
> > > > > > > > > > > testing system, users have the exact same testing 
> > > > > > > > > > > capabilities locally as they would on our CI.
> > > > > > > > > > >
> > > > > > > > > > > I propose that we allow unverified PR's to run basic and 
> > > > > > > > > > > static tests, but not perform the full test suite or 
> > > > > > > > > > > integration test without first being reviewed.
> > > > > > > > > > >
> > > > > > > > > > > What does everyone think?
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Jarek Potiuk
> > > > > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Jarek Potiuk
> > > > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > M: +48 660 796 129 (tel:+48660796129)
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > M: +48 660 796 129 (tel:+48660796129)

Re: Add /ready-to-test bot to airflow CI

Reply via email to