This seems to come up a lot. Maybe we should change something?

Having worked on a number of projects and at companies with this policy,
companies using non-distributed source control, and companies that just
"use git like git", I know all these ways of life pretty well.

TL;DR my experience is:
 - when people care about the commit history and take care of it, then just
"use git like git" results in faster development and clearer history,
despite intermediate commits not being tested by Jenkins/Travis/GHA
 - when people see git as an inconvenience, view the history as an
implementation detail, or think in linear history of PR merges and view the
commits as an implementation detail, it becomes a mess

Empirically, this is what I expect from a 1 commit = 1 PR policy (and how I
feel about each point):
 - fewer commits with bad messages (yay!)
 - simpler git graph if we squash + rebase (meh)
 - larger commits of related-but-independent changes (could be OK)
 - commits with bullet points in their description that bundle unrelated
changes (sad face)
 - slowdown of development (neutral - slow can be good)
 - fewer "quality of life" improvements, since those would add lines of
diff to a PR and are off topic; when they have to be done in a separate PR
they don't get done and they don't get reviewed with the same priority
(extra sad face)

<rant>I know I am in the minority. I tend to have a lot of PRs where there
are 2-5 fairly independent commits. It is "to aid code review" but not in
the way you might think: The best size for code review is pretty big,
compared to the best size for commit. A commit is the unit of roll-forward,
roll-back, cherry-pick, etc. Brian's point about commits not being
independently tested is important: this is a tooling issue, but not that
easy to change. Here is why I am not that worried about it: I believe
strongly in a "rollback first" policy to restore greenness, but also that
the rollback change itself must be verified to restore greenness. When a
multi-commit PR fails, you can easily open a revert of the whole PR as well
as reverts of individual suspect commits. The CI for these will finish
around the same time, and if you manage a smaller revert, great! Imagine if
to revert a PR you had to revert _every_ change between HEAD and that PR.
It would restore to a known green state. Yet we don't do this, because we
have technology that makes it unnecessary. Ultimately, single large commits
with bullet points are just an unstructured version of multi-commit PRs. So
I favor the structure. But people seem to be more likely to write good
bullet points than to write independent commits. Perhaps because it is
easier.</rant>

So at this point, I think I am OK with a 1 commit per PR policy. I think
the net benefits to our commit history would be good. I have grown tired of
repeating the conversation. Rebase-and-squash edits commit ids in ways that
confuses tools, so I do not favor this. Tooling that merges one commit at a
time (without altering commit id) would also be super cool and not that
hard. It would prevent intermediate results from merging, solving both
problems.

Kenn


On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette <bhule...@google.com> wrote:

> I'd argue that the history is almost always "most useful" when one PR ==
> one commit on master. Intermediate commits from a PR may be useful to aid
> code review, but they're not verified by presubmits and thus aren't
> necessarily independently revertible, so I see little value in keeping them
> around on master. In fact if you're breaking up a PR into multiple commits
> to aid code review, it's worth considering if they could/should be
> separately reviewed and verified PRs.
> We could solve the unwanted commit issue if we have a policy to always
> "Squash and Merge" PRs with rare exceptions.
>
> I agree jira/PR titles could be better, I'm not sure what we can do about
> it aside from reminding committers of this responsibility. Perhaps the
> triage process can help catch poorly titled jiras?
>
> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> +1 to better descriptions for JIRA (and PRs). Thanks for bringing this
>> up.
>>
>> For merging unwanted commits, can we automate a simple check (e.g. with
>> github actions)?
>>
>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki <suzt...@google.com> wrote:
>>
>>> BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
>>> [1], I see I was not following this
>>>
>>> > The reviewer should give the LGTM and then request that the author of
>>> the pull request rebase, squash, split, etc, the commits, so that the
>>> history is most useful
>>>
>>>
>>> Thank you for the feedback on this matter! (And I don't think we
>>> should change the contribution guide)
>>>
>>> [1] https://beam.apache.org/contribute/committer-guide/
>>>
>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>>> >
>>> > Hello,
>>> >
>>> > I have noticed an ongoing pattern of carelessness around issues/PR
>>> titles and
>>> > descriptions. It is really painful to see more and more examples like:
>>> >
>>> > BEAM-12160 Add TODO for fixing the warning
>>> > BEAM-12165 Fix ParquetIO
>>> > BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
>>> > toMinutes (commit)
>>> >
>>> > In all these cases with just a bit of detail in the title it would be
>>> enough to
>>> > make other contributors or reviewers life easierm as well as to have a
>>> better
>>> > project history.  What astonishes me apart of the lack of care is that
>>> some of
>>> > those are from Beam commmitters.
>>> >
>>> > We already have discussed about not paying attention during commit
>>> merges where
>>> > some PRs end up merging tons of 'unwanted' fixup commits, and nothing
>>> has
>>> > changed so I am wondering if we should maybe just totally remove that
>>> rule (for
>>> > commits) and also eventually for titles and descriptions.
>>> >
>>> > Ismaël
>>> >
>>> > [1] https://beam.apache.org/contribute/
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>

Reply via email to