On Thu, Apr 22, 2021 at 9:33 AM Kenneth Knowles <k...@apache.org> wrote:

>
> On Thu, Apr 22, 2021 at 7:04 AM Alexey Romanenko <aromanenko....@gmail.com>
> wrote:
>
>> Thanks Ismael for bringing this on the table again. Kind of my
>> “favourite” topic, unfortunately, that I raised a couple of times… Let me
>> share some of my thoughts on this.
>>
>> First of all, as Beam developers, honestly we have to agree if we care
>> about our commits history or not. If not (or not so much) then probably
>> there is no more things to discuss and we use Git as just Git… It’s not a
>> bad thing, it’s just different but for large projects, like Beam, clear
>> commits history is ultra important, imho.
>>
>> Well, for now we do care and we clearly mention this in our Contribution
>> Guide. Probably, it sounds only as a recommendation there or not all
>> contributors (especially first-time ones) read this or take this into
>> account or pay attention on this. It’s fine and we always can expect not
>> following our guide because of many different reasons. And this is exactly
>> where Committers have to play their role! I mean that our clear Git history
>> mostly relies on committer's shoulders and, before clicking on *Merge*
>> button, every committer have (even “must" I’d say) make sure that PR
>> respects all our rules (we have them because of some reasons, right?) and
>> ready to be merged. Nice and correct titles/messages is one this thing.
>> Personally, the first thing that I do once I start to do a review and
>> before merge, is checking the PR’s title, branches (if it’s from a feature
>> branch and against main Beam branch), number of commits and their messages.
>> Then I take a look on related Jira issue which ID should be prefixed to
>> PR's title and commit’s message(s).
>>
>> For sure, there are always exceptions. In case of emergency, for example,
>> if the build is broken because of tiny thing then it makes sense to fix
>> this as fast as possible and then, probably, to neglect some rules. But if
>> exceptions become the common practice and happen quite often, then it’s a
>> signal that either we have to change the rules or change our attitude to
>> this.
>>
>> As I see, the initial Ismael’s message of his email was more about titles
>> and multiple commits per PR is another but, of course, related topic. For
>> both, I believe we can partly automate it - add checks to prevent merging
>> the commits with not correct messages or/and limit number of commits per
>> PR, for example. Some other big projects, like Apache Spark, have even
>> special tool to merge PR in well-formed way [1]. I’m not sure that we need
>> to have something similar because I’m pretty sure it will affect the
>> performance of adding new fixes/features (at least in the beginning), but
>> since we already started the similar discussions several times on regular
>> bases, we might want to think in this way as an option too.
>>
>
> Noting that we had one too [1]. The trouble was that the bot had a lot of
> downtime, code was not part of Beam's repo, and also did not encode best
> practices (for example it broke the connection between PRs and master
> branch history because it just cherry-picked and squashed commits and
> pushed those new unrelated commits straight to master). A script would
> address much of this.
>

Yeah, the mergebot was much more hassle than it was worth, and lots harder
to use than pushing a button. I wouldn't be opposed to trying again with a
better (simpler, under our control) one (and in my investigations of github
actions, it doesn't look that hard).

Heuristic CI that says "this commit history looks OK" might solve a lot
> of the problem (I see that Robert already started on this).
>
> And finally I was to repeat my agreement with Ismaël and Alexey that the
> root problem is this: we need to actually care about the commit history and
> communication of PR/commit titles and descriptions. We use tools to help us
> to implement our intentions and to communicate them to newcomers, but I
> don't think this will replace taking care of the repo.
>

Committers should care about taking care of the repo more than the average
contributor, but even there there is high variance. I think the issue is
"oh, I didn't think to squash vs. merge" rather than "who cares, I always
press merge anyway" in which case a timely reminder will go a long way.

Kenn
>
> [1]
> https://lists.apache.org/thread.html/4a65fb0b66935c9dc61568a3067538775edc3e685c6ac03dd3fa4725%40%3Cdev.beam.apache.org%3E
>
>
>>
>> As for me, I’d prefer that every committer paid more attention (if not
>> yet) on these “non code” things before reviewing/merging a PR.
>>
>> [1] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
>>
>>
>> On 22 Apr 2021, at 01:28, Robert Bradshaw <rober...@google.com> wrote:
>>
>> I am also in the camp that it often makes sense to have more than 1
>> commit per PR, but rather than enforce a 1 commit per PR policy, I would
>> say that it's too much bother to look at the commit history whether it
>> should be squashed or merged (though I think it is almost always very
>> obvious which is preferable for a given PR), go ahead and squash rather
>> than merge by default.
>>
>>
>> On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles <k...@apache.org> wrote:
>>
>>> This seems to come up a lot. Maybe we should change something?
>>>
>>> Having worked on a number of projects and at companies with this policy,
>>> companies using non-distributed source control, and companies that just
>>> "use git like git", I know all these ways of life pretty well.
>>>
>>> TL;DR my experience is:
>>>  - when people care about the commit history and take care of it, then
>>> just "use git like git" results in faster development and clearer history,
>>> despite intermediate commits not being tested by Jenkins/Travis/GHA
>>>  - when people see git as an inconvenience, view the history as an
>>> implementation detail, or think in linear history of PR merges and view the
>>> commits as an implementation detail, it becomes a mess
>>>
>>> Empirically, this is what I expect from a 1 commit = 1 PR policy (and
>>> how I feel about each point):
>>>  - fewer commits with bad messages (yay!)
>>>  - simpler git graph if we squash + rebase (meh)
>>>  - larger commits of related-but-independent changes (could be OK)
>>>  - commits with bullet points in their description that bundle unrelated
>>> changes (sad face)
>>>  - slowdown of development (neutral - slow can be good)
>>>  - fewer "quality of life" improvements, since those would add lines of
>>> diff to a PR and are off topic; when they have to be done in a separate PR
>>> they don't get done and they don't get reviewed with the same priority
>>> (extra sad face)
>>>
>>> <rant>I know I am in the minority. I tend to have a lot of PRs where
>>> there are 2-5 fairly independent commits. It is "to aid code review" but
>>> not in the way you might think: The best size for code review is pretty
>>> big, compared to the best size for commit. A commit is the unit of
>>> roll-forward, roll-back, cherry-pick, etc. Brian's point about commits not
>>> being independently tested is important: this is a tooling issue, but not
>>> that easy to change. Here is why I am not that worried about it: I believe
>>> strongly in a "rollback first" policy to restore greenness, but also that
>>> the rollback change itself must be verified to restore greenness. When a
>>> multi-commit PR fails, you can easily open a revert of the whole PR as well
>>> as reverts of individual suspect commits. The CI for these will finish
>>> around the same time, and if you manage a smaller revert, great! Imagine if
>>> to revert a PR you had to revert _every_ change between HEAD and that PR.
>>> It would restore to a known green state. Yet we don't do this, because we
>>> have technology that makes it unnecessary. Ultimately, single large commits
>>> with bullet points are just an unstructured version of multi-commit PRs. So
>>> I favor the structure. But people seem to be more likely to write good
>>> bullet points than to write independent commits. Perhaps because it is
>>> easier.</rant>
>>>
>>> So at this point, I think I am OK with a 1 commit per PR policy. I think
>>> the net benefits to our commit history would be good. I have grown tired of
>>> repeating the conversation. Rebase-and-squash edits commit ids in ways that
>>> confuses tools, so I do not favor this. Tooling that merges one commit at a
>>> time (without altering commit id) would also be super cool and not that
>>> hard. It would prevent intermediate results from merging, solving both
>>> problems.
>>>
>>> Kenn
>>>
>>>
>>> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette <bhule...@google.com>
>>> wrote:
>>>
>>>> I'd argue that the history is almost always "most useful" when one PR
>>>> == one commit on master. Intermediate commits from a PR may be useful to
>>>> aid code review, but they're not verified by presubmits and thus aren't
>>>> necessarily independently revertible, so I see little value in keeping them
>>>> around on master. In fact if you're breaking up a PR into multiple commits
>>>> to aid code review, it's worth considering if they could/should be
>>>> separately reviewed and verified PRs.
>>>> We could solve the unwanted commit issue if we have a policy to always
>>>> "Squash and Merge" PRs with rare exceptions.
>>>>
>>>> I agree jira/PR titles could be better, I'm not sure what we can do
>>>> about it aside from reminding committers of this responsibility. Perhaps
>>>> the triage process can help catch poorly titled jiras?
>>>>
>>>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>>
>>>>> +1 to better descriptions for JIRA (and PRs). Thanks for bringing this
>>>>> up.
>>>>>
>>>>> For merging unwanted commits, can we automate a simple check (e.g.
>>>>> with github actions)?
>>>>>
>>>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki <suzt...@google.com>
>>>>> wrote:
>>>>>
>>>>>> BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
>>>>>> [1], I see I was not following this
>>>>>>
>>>>>> > The reviewer should give the LGTM and then request that the author
>>>>>> of the pull request rebase, squash, split, etc, the commits, so that the
>>>>>> history is most useful
>>>>>>
>>>>>>
>>>>>> Thank you for the feedback on this matter! (And I don't think we
>>>>>> should change the contribution guide)
>>>>>>
>>>>>> [1] https://beam.apache.org/contribute/committer-guide/
>>>>>>
>>>>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía <ieme...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hello,
>>>>>> >
>>>>>> > I have noticed an ongoing pattern of carelessness around issues/PR
>>>>>> titles and
>>>>>> > descriptions. It is really painful to see more and more examples
>>>>>> like:
>>>>>> >
>>>>>> > BEAM-12160 Add TODO for fixing the warning
>>>>>> > BEAM-12165 Fix ParquetIO
>>>>>> > BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
>>>>>> > toMinutes (commit)
>>>>>> >
>>>>>> > In all these cases with just a bit of detail in the title it would
>>>>>> be enough to
>>>>>> > make other contributors or reviewers life easierm as well as to
>>>>>> have a better
>>>>>> > project history.  What astonishes me apart of the lack of care is
>>>>>> that some of
>>>>>> > those are from Beam commmitters.
>>>>>> >
>>>>>> > We already have discussed about not paying attention during commit
>>>>>> merges where
>>>>>> > some PRs end up merging tons of 'unwanted' fixup commits, and
>>>>>> nothing has
>>>>>> > changed so I am wondering if we should maybe just totally remove
>>>>>> that rule (for
>>>>>> > commits) and also eventually for titles and descriptions.
>>>>>> >
>>>>>> > Ismaël
>>>>>> >
>>>>>> > [1] https://beam.apache.org/contribute/
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Tomo
>>>>>>
>>>>>
>>

Reply via email to