Hi all,

I have drafted a document summarizing the ideas suggested in this thread.

https://docs.google.com/document/d/10FlXOo_hL2QYTPhwS8uHSyJbQCzwC3K3C12tFccANA8/edit#

where I extracted the following action items:

short-term (easy) items
- Split the linting task (e.g. java checkStyle) out from the test (java
percommit) step

mid-to-long-term items
- A single github action that indicates the PR is ready to merge
- Merging the PR through action

I am happy to volunteer to take on this task. Once we have decided the
direction(s) to proceed I am going to detail the design.

Regards,
Yi



On Tue, Jun 28, 2022 at 10:40 PM Ahmet Altay <al...@google.com> wrote:

> I am worried about adding our own custom tooling. They require
> maintenance, they introduce new flakiness, and so far we have not been
> great about maintaining custom infra.
>
> On Tue, Jun 28, 2022 at 1:36 PM Kenneth Knowles <k...@apache.org> wrote:
>
>> Thanks for all the super useful info. I can add some experience from the
>> early days of Beam: Because the GitHub "merge" button did not exist, we
>> rolled our own "merge bot". The pros/cons below are not general to the
>> concept, but to our experience with it:
>>
>> PROs:
>>  - It single-threaded commits to the repo so there weren't race
>> conditions on test results. I think this will be OK for our scale for the
>> foreseeable future.
>>  - It re-ran tests after merge but before pushing to master, which is the
>> second half of eliminating that race condition.
>>
>> CONs:
>>  - It was very flaky and often just didn't stay up. We danced for joy
>> when it was gone.
>>  - It ran a kind of arbitrary set of tests that didn't match the PR
>> statuses. It did not have any filter on which tests were run during merge.
>> We were small enough that mostly just "run the tests" was specific enough.
>>  - It always squashed the commits and then pushed the squashed commit
>> with a comment "this closes #<pr>". This messes up PRs that have multiple
>> commits that should remain separate. But more importantly it made it
>> impossible to easily distinguish PRs that were merged versus those that
>> were closed without merge. And of course it is way harder to navigate
>> history when the commits on master have different hashes than the commits
>> that were authored.
>>  - Getting reasonable logs with error messages when a merge failed for
>> whatever reason was hard or impossible IIRC.
>>
>> So I think in what you have said there is an option to get the best of
>> all, something like:
>>
>>  - Do merges through an action. Merge commit or squash-and-merge would be
>> separate labels. Or not bother with squash and merge, instead using
>> heuristics to block PRs with bad commit histories.
>>  - Have the workflow check that all PR statuses are green before
>> continuing to merge.
>>
>
> This sounds like a reasonable process. I assume the only addition in this
> case would be another github action.
>
>
>>
>> Kenn
>>
>> On Tue, Jun 28, 2022 at 8:03 AM Danny McCormick <
>> dannymccorm...@google.com> wrote:
>>
>>> After looking into this a little bit more, I need to revise my opinion
>>> on how we would do this; I don't think it's practical to have all required
>>> status checks via the .asf.yml file because those required checks can't be
>>> filtered by path (for example, if we want to require Python precommit on
>>> Python PRs, it would need to be required on *all *PRs). That's a GitHub
>>> limitation
>>> <https://docs.github.com/en/enterprise-cloud@latest/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/troubleshooting-required-status-checks#handling-skipped-but-required-checks>,
>>> not an ASF one.
>>>
>>> One option is to write an action that makes sure no checks are failing
>>> (except maybe codecov?) and put a single required check on that. That would
>>> also make it easy to build in logic to override required checks like Robert
>>> suggested ("specific wording would have to be in the last comment"). We
>>> already have logic in the PR bot that does some of this
>>> <https://github.com/apache/beam/blob/master/scripts/ci/pr-bot/shared/checks.ts>
>>> .
>>>
>>> The downside to that approach is that it's not clear what the best way
>>> to trigger that workflow is since it has to run after all other checks have
>>> completed. We could have it trigger on some label (e.g. "ready to merge")
>>> and then automatically merge the PR when it's done or comment and remove
>>> the label if checks are failing/incomplete. This changes the workflow for
>>> committers from "click the merge button" to "add a label", but doesn't
>>> require significantly more action or oversight and is pretty similar to how
>>> Kubernetes <https://github.com/kubernetes/kubernetes/pull/110812> and
>>> some other large repos run things.
>>>
>>> Another option would be to trigger that check at the end of every
>>> Actions run and use the check_suite trigger for external runs
>>> (unfortunately actions doesn't trigger that). That would require a lot of
>>> boilerplate on every Actions workflow though. There are other similar
>>> options which may be cleaner that also require modifying every Actions
>>> workflow, but I don't love that option.
>>>
>>> I'm still in favor of doing this via the "ready to merge" label option I
>>> just described, but this has dampened my enthusiasm a little bit since we'd
>>> need to build out/maintain our own tooling.
>>>
>>> On Tue, Jun 28, 2022 at 8:49 AM Danny McCormick <
>>> dannymccorm...@google.com> wrote:
>>>
>>>> Regarding code cov - it does check overall % coverage as well as the
>>>> coverage of the diff, but I don't think it's designed to be a blocking
>>>> metric. A good example of where this is not helpful is this pr to
>>>> remove dead code <https://github.com/apache/beam/pull/22019>. Because
>>>> it removes tested code, the pr lowers our coverage percentage and fails the
>>>> check, but it's a totally reasonable PR that doesn't need additional
>>>> testing (because it only removes functionality). Other classes of changes
>>>> that are problematic include things like improving hard to cover error
>>>> messages or tough to test proto changes which can only be covered in an
>>>> integration test that doesn't make it into codecov (these are both common
>>>> at least in the Go SDK).
>>>>
>>>> I'd be opposed to including that as a required check, though I support
>>>> adding pretty much every other suite.
>>>>
>>>> > I assume we would have a way to override it in case the repo gets
>>>> into a bad state (for example needing to modify the workflow definitions in
>>>> order to get them to pass).
>>>>
>>>> The way we make a check required is by adding it in the .asf.yml file
>>>> <https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Branchprotection>.
>>>> So we could always update that if needed to push an emergency fix as long
>>>> as we don't accidentally include any required checks on the .asf.yml
>>>> itself. If we mess that up we would need to bring in Infra to help.
>>>>
>>>> On Mon, Jun 27, 2022 at 7:59 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 27, 2022 at 4:55 PM Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Mon, Jun 27, 2022 at 3:06 PM Kenneth Knowles <k...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I am in favor of requiring precommit green before merge. To make
>>>>>>> this better, we should:
>>>>>>>
>>>>>>> (1) run fewer irrelevant tests, for example almost every
>>>>>>> language-based presubmit tests vastly more code than could possibly be
>>>>>>> affected.
>>>>>>> (2) run more relevant tests, for example when an IO is touched
>>>>>>> running slightly more heavyweight integration tests before merge is
>>>>>>> reasonable.
>>>>>>>
>>>>>>
>>>>>> Does it have to be all or nothing? E.g. we could start with requiring
>>>>>> a smaller subset of tests that must pass, with the goal of eventually
>>>>>> requiring (almost?) everything.
>>>>>>
>>>>>>
>>>>>>> I worry that requiring total line coverage in unit tests /
>>>>>>> presubmits will encourage mock-based tests that don't demonstrate any
>>>>>>> meaningful correctness. But I am OK to try it.
>>>>>>>
>>>>>>
>>>>> IIRC, codecov test is checking that % coverage is not reduced with the
>>>>> new PR and it is not checking for total line coverage. I agree with you
>>>>> that total line coverage is probably not meaningful in most cases.
>>>>>
>>>>>
>>>>>>
>>>>>> +1. I think there are a class of presubmits (coverage among them)
>>>>>> that are advisory rather than binding.
>>>>>>
>>>>>
>>>>> +1 - especially if we can make the tools distinguish between required
>>>>> and advisory presubmits.
>>>>>
>>>>>
>>>>>>
>>>>>>> I assume we would have a way to override it in case the repo gets
>>>>>>> into a bad state (for example needing to modify the workflow 
>>>>>>> definitions in
>>>>>>> order to get them to pass).
>>>>>>>
>>>>>>
>>>>>> +1. But making this explicit (e.g. specific wording would have to be
>>>>>> in the last comment) would be a good barrier to have even if it's not
>>>>>> absolute.
>>>>>>
>>>>>>
>>>>>>> On Mon, Jun 27, 2022 at 2:58 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just an opinion: If we have a precommit (including codecov) we
>>>>>>>> should be willing to enforce those passing before merging a PR. 
>>>>>>>> Selectively
>>>>>>>> applying when/which precommits would be blockers eventually leads to 
>>>>>>>> more
>>>>>>>> opinions. I think codecov is actually providing value by checking that
>>>>>>>> delta coverage is not dropped. In my experience, for PRs that fail the
>>>>>>>> codecov precommit there tends to be an opportunity for adding unit 
>>>>>>>> tests.
>>>>>>>>
>>>>>>>> On Mon, Jun 27, 2022 at 2:14 PM Danny McCormick <
>>>>>>>> dannymccorm...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Brian, you keep beating my emails by an instant :) I think codecov
>>>>>>>>> provides value independent of blocking a PR - it is a good sanity 
>>>>>>>>> check
>>>>>>>>> (e.g. if codecov drops from 50% to 5%) and it gives us a nice metric 
>>>>>>>>> to
>>>>>>>>> track over time.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 27, 2022 at 5:12 PM Danny McCormick <
>>>>>>>>> dannymccorm...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yeah, codecov is the one exception I had in mind, there may be
>>>>>>>>>> others that are reasonable to exclude as well
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 27, 2022 at 4:59 PM Jack McCluskey <
>>>>>>>>>> jrmcclus...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 for enforcing some precommit checks as required, but making
>>>>>>>>>>> them all required would be a bit of a nightmare with how Codecov 
>>>>>>>>>>> works. Any
>>>>>>>>>>> Python or Go patch gets flagged as failing codecov if its changes 
>>>>>>>>>>> aren't
>>>>>>>>>>> covered at or above the current overall coverage level at master 
>>>>>>>>>>> (around
>>>>>>>>>>> 75% right now.) This is a nice goal, but ultimately not every 
>>>>>>>>>>> change lends
>>>>>>>>>>> itself to that level of unit testing. But I would be in favor of 
>>>>>>>>>>> things
>>>>>>>>>>> like the various ValidatesRunner suites being required.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Jack McCluskey
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 27, 2022 at 4:51 PM Danny McCormick <
>>>>>>>>>>> dannymccorm...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Brian beat me by a few seconds, but I actually agree that there
>>>>>>>>>>>> is a problem here - when Damon's PR was submitted there was 
>>>>>>>>>>>> basically no
>>>>>>>>>>>> way of knowing that checks would break without having detailed 
>>>>>>>>>>>> knowledge of
>>>>>>>>>>>> how the build system works (so I'm on board with your
>>>>>>>>>>>> Subject/Background/Assessment). With that said, I disagree with the
>>>>>>>>>>>> recommended path forward - our automation failed us here and I 
>>>>>>>>>>>> don't think
>>>>>>>>>>>> the correct response is to make our manual validation better. That 
>>>>>>>>>>>> shifts
>>>>>>>>>>>> the burden of verifying correctness from robots (easy to validate,
>>>>>>>>>>>> basically free) to humans (expensive, forgetful, and hard to 
>>>>>>>>>>>> validate,
>>>>>>>>>>>> especially for newer contributors). Instead, I think we should fix 
>>>>>>>>>>>> the
>>>>>>>>>>>> underlying technical problems, which are IMO:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Flaky tests unrelated to the PR caused it to fail
>>>>>>>>>>>>
>>>>>>>>>>>
> I missed this part earlier. I would encourage all committers who are
> blocked on flaky tests to file a new github issue with a link to the flaked
> test before re-running the flaking test. Re-running tests until they pass
> also makes it easier to introduce new flaky tests.
>
>
>> 2) There were no guardrails in place to avoid merging a PR that wasn't
>>>>>>>>>>>> fully validated.
>>>>>>>>>>>>
>>>>>>>>>>>> Things that would've made this better which I think we should
>>>>>>>>>>>> focus our attention on:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) (easy) Split the linting task out from the test step.
>>>>>>>>>>>> 2) This one is more controversial, but I would vote we start
>>>>>>>>>>>> enforcing some (or all?) precommit checks as required. I don't 
>>>>>>>>>>>> really blame
>>>>>>>>>>>> anyone for merging this PR since it had repeatedly run into 
>>>>>>>>>>>> unrelated
>>>>>>>>>>>> flakes, but that's kinda the point - we need to feel some of that 
>>>>>>>>>>>> pain to
>>>>>>>>>>>> avoid accidentally merging bad code. Blocking PRs on failed 
>>>>>>>>>>>> precommit
>>>>>>>>>>>> suites would force us to deal with these kinds of flakes and 
>>>>>>>>>>>> incentivize a
>>>>>>>>>>>> lot of good behavior in the repo (and disincentivize bad tests).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Danny
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 27, 2022 at 4:48 PM Brian Hulette <
>>>>>>>>>>>> bhule...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Damon,
>>>>>>>>>>>>> I think our CI system was working as intended for your PR. We
>>>>>>>>>>>>> select which PreCommits to run based on the files that are 
>>>>>>>>>>>>> edited, and the
>>>>>>>>>>>>> author/reviewer should collaborate to get those green before 
>>>>>>>>>>>>> merging the
>>>>>>>>>>>>> PR. Sometimes that will involve inspecting the logs in Jenkins to 
>>>>>>>>>>>>> find the
>>>>>>>>>>>>> failing gradle tasks and run them locally to reproduce.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like what broke down in this case was that the PR was
>>>>>>>>>>>>> accidentally merged despite the failing Java PreCommit run.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 27, 2022 at 12:02 PM Damon Douglas <
>>>>>>>>>>>>> damondoug...@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I write this to prevent a recent experience I encountered
>>>>>>>>>>>>>> contributing to the Apache Beam repository.  To address a system 
>>>>>>>>>>>>>> issue, I
>>>>>>>>>>>>>> write in SBAR
>>>>>>>>>>>>>> <https://www.hopkinsmedicine.org/antimicrobial-stewardship/nursing-toolkit/_docs/sbar-communication-tool.pdf>
>>>>>>>>>>>>>> format for clarity.  If there are no objections to the 
>>>>>>>>>>>>>> recommendation, I'm
>>>>>>>>>>>>>> happy to fulfill the first recommendation i.e. create GitHub 
>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Subject*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I submitted a PR <https://github.com/apache/beam/pull/21953>
>>>>>>>>>>>>>> that was subsequently reverted
>>>>>>>>>>>>>> <https://github.com/apache/beam/pull/22044>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Background*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Despite passing the following gradle tasks,
>>>>>>>>>>>>>> the PreCommit_Java_Commit failed with a test unrelated to
>>>>>>>>>>>>>> the code in the PR
>>>>>>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/23066/testReport/junit/org.apache.beam.sdk.io.gcp.bigquery/BigQueryServicesImplTest/testInsertWithinRowCountLimits/>.
>>>>>>>>>>>>>> The PR was merged and then reverted due to discovered vendor 
>>>>>>>>>>>>>> dependency
>>>>>>>>>>>>>> issues and failed style checks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ./gradlew rat
>>>>>>>>>>>>>> ./gradlew spotlessCheck
>>>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:test
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Assessment*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Output from the following gradle commands did not surface
>>>>>>>>>>>>>> prior to the PR merge.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:checkStyleMain
>>>>>>>>>>>>>> ./gradlew sdks:java:io:google-cloud-platform:checkStyleTest
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Both the aforementioned failing PreCommit_Java_Commit test
>>>>>>>>>>>>>> and a lack of knowledge contributed to the root cause.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Recommendation / Proposal*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I propose updating PR template description
>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/.github/PULL_REQUEST_TEMPLATE.md>
>>>>>>>>>>>>>>  that
>>>>>>>>>>>>>> prescribes gradle tasks to be successfully executed prior to 
>>>>>>>>>>>>>> submitting a
>>>>>>>>>>>>>> PR.  Toward this aim:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.  Four separate GitHub issues in
>>>>>>>>>>>>>> https://github.com/apache/beam representative of *common*,
>>>>>>>>>>>>>> and language specific: *Java*, *Python*, and *Go*,
>>>>>>>>>>>>>> prescribed gradle tasks
>>>>>>>>>>>>>> 2.  Don't reply to this email with specifics on how;
>>>>>>>>>>>>>> delegation of comments/discussion within the relevant context of 
>>>>>>>>>>>>>> the GitHub
>>>>>>>>>>>>>> issues in 1
>>>>>>>>>>>>>> 3.  PRs that fix the GitHub issues in 1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Damon Douglas*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Strategic Cloud Engineer, Data & Analytics, Google Cloud
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> damondoug...@google.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Reply via email to