In the past github squash and merge did not preserve the committer identity correctly, is it still the case? If so we should not be using it. https://github.com/isaacs/github/issues/1368
On Thu, Apr 22, 2021 at 8:41 PM Robert Bradshaw <rober...@google.com> wrote: > > On Thu, Apr 22, 2021 at 11:29 AM Valentyn Tymofieiev <valen...@google.com> > wrote: >> >> I always squash-and-merge even when there is only 1 commit. This avoids the >> necessity to edit the commit message to remove not so helpful "Merge pull >> request xxx" message. Is there any harm to recommend squash by default in >> the upcoming squash bot even for single commit PRs? > > > Does squash-and-merge in that case preserve the commit as-is if there's only > one? In that case, there'd be no issues of history. (I opted to not comment > on 1-commit PRs to be less chatty.) > >> >> >> On Thu, Apr 22, 2021 at 11:19 AM Robert Bradshaw <rober...@google.com> wrote: >>> >>> On Thu, Apr 22, 2021 at 9:33 AM Kenneth Knowles <k...@apache.org> wrote: >>>> >>>> >>>> On Thu, Apr 22, 2021 at 7:04 AM Alexey Romanenko >>>> <aromanenko....@gmail.com> wrote: >>>>> >>>>> Thanks Ismael for bringing this on the table again. Kind of my >>>>> “favourite” topic, unfortunately, that I raised a couple of times… Let me >>>>> share some of my thoughts on this. >>>>> >>>>> First of all, as Beam developers, honestly we have to agree if we care >>>>> about our commits history or not. If not (or not so much) then probably >>>>> there is no more things to discuss and we use Git as just Git… It’s not a >>>>> bad thing, it’s just different but for large projects, like Beam, clear >>>>> commits history is ultra important, imho. >>>>> >>>>> Well, for now we do care and we clearly mention this in our Contribution >>>>> Guide. Probably, it sounds only as a recommendation there or not all >>>>> contributors (especially first-time ones) read this or take this into >>>>> account or pay attention on this. It’s fine and we always can expect not >>>>> following our guide because of many different reasons. And this is >>>>> exactly where Committers have to play their role! I mean that our clear >>>>> Git history mostly relies on committer's shoulders and, before clicking >>>>> on Merge button, every committer have (even “must" I’d say) make sure >>>>> that PR respects all our rules (we have them because of some reasons, >>>>> right?) and ready to be merged. Nice and correct titles/messages is one >>>>> this thing. Personally, the first thing that I do once I start to do a >>>>> review and before merge, is checking the PR’s title, branches (if it’s >>>>> from a feature branch and against main Beam branch), number of commits >>>>> and their messages. Then I take a look on related Jira issue which ID >>>>> should be prefixed to PR's title and commit’s message(s). >>>>> >>>>> For sure, there are always exceptions. In case of emergency, for example, >>>>> if the build is broken because of tiny thing then it makes sense to fix >>>>> this as fast as possible and then, probably, to neglect some rules. But >>>>> if exceptions become the common practice and happen quite often, then >>>>> it’s a signal that either we have to change the rules or change our >>>>> attitude to this. >>>>> >>>>> As I see, the initial Ismael’s message of his email was more about titles >>>>> and multiple commits per PR is another but, of course, related topic. For >>>>> both, I believe we can partly automate it - add checks to prevent merging >>>>> the commits with not correct messages or/and limit number of commits per >>>>> PR, for example. Some other big projects, like Apache Spark, have even >>>>> special tool to merge PR in well-formed way [1]. I’m not sure that we >>>>> need to have something similar because I’m pretty sure it will affect the >>>>> performance of adding new fixes/features (at least in the beginning), but >>>>> since we already started the similar discussions several times on regular >>>>> bases, we might want to think in this way as an option too. >>>> >>>> >>>> Noting that we had one too [1]. The trouble was that the bot had a lot of >>>> downtime, code was not part of Beam's repo, and also did not encode best >>>> practices (for example it broke the connection between PRs and master >>>> branch history because it just cherry-picked and squashed commits and >>>> pushed those new unrelated commits straight to master). A script would >>>> address much of this. >>> >>> >>> Yeah, the mergebot was much more hassle than it was worth, and lots harder >>> to use than pushing a button. I wouldn't be opposed to trying again with a >>> better (simpler, under our control) one (and in my investigations of github >>> actions, it doesn't look that hard). >>> >>>> Heuristic CI that says "this commit history looks OK" might solve a lot of >>>> the problem (I see that Robert already started on this). >>>> >>>> And finally I was to repeat my agreement with Ismaël and Alexey that the >>>> root problem is this: we need to actually care about the commit history >>>> and communication of PR/commit titles and descriptions. We use tools to >>>> help us to implement our intentions and to communicate them to newcomers, >>>> but I don't think this will replace taking care of the repo. >>> >>> >>> Committers should care about taking care of the repo more than the average >>> contributor, but even there there is high variance. I think the issue is >>> "oh, I didn't think to squash vs. merge" rather than "who cares, I always >>> press merge anyway" in which case a timely reminder will go a long way. >>> >>>> Kenn >>>> >>>> [1] >>>> https://lists.apache.org/thread.html/4a65fb0b66935c9dc61568a3067538775edc3e685c6ac03dd3fa4725%40%3Cdev.beam.apache.org%3E >>>> >>>>> >>>>> >>>>> As for me, I’d prefer that every committer paid more attention (if not >>>>> yet) on these “non code” things before reviewing/merging a PR. >>>>> >>>>> [1] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py >>>>> >>>>> >>>>> On 22 Apr 2021, at 01:28, Robert Bradshaw <rober...@google.com> wrote: >>>>> >>>>> I am also in the camp that it often makes sense to have more than 1 >>>>> commit per PR, but rather than enforce a 1 commit per PR policy, I would >>>>> say that it's too much bother to look at the commit history whether it >>>>> should be squashed or merged (though I think it is almost always very >>>>> obvious which is preferable for a given PR), go ahead and squash rather >>>>> than merge by default. >>>>> >>>>> >>>>> On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles <k...@apache.org> wrote: >>>>>> >>>>>> This seems to come up a lot. Maybe we should change something? >>>>>> >>>>>> Having worked on a number of projects and at companies with this policy, >>>>>> companies using non-distributed source control, and companies that just >>>>>> "use git like git", I know all these ways of life pretty well. >>>>>> >>>>>> TL;DR my experience is: >>>>>> - when people care about the commit history and take care of it, then >>>>>> just "use git like git" results in faster development and clearer >>>>>> history, despite intermediate commits not being tested by >>>>>> Jenkins/Travis/GHA >>>>>> - when people see git as an inconvenience, view the history as an >>>>>> implementation detail, or think in linear history of PR merges and view >>>>>> the commits as an implementation detail, it becomes a mess >>>>>> >>>>>> Empirically, this is what I expect from a 1 commit = 1 PR policy (and >>>>>> how I feel about each point): >>>>>> - fewer commits with bad messages (yay!) >>>>>> - simpler git graph if we squash + rebase (meh) >>>>>> - larger commits of related-but-independent changes (could be OK) >>>>>> - commits with bullet points in their description that bundle unrelated >>>>>> changes (sad face) >>>>>> - slowdown of development (neutral - slow can be good) >>>>>> - fewer "quality of life" improvements, since those would add lines of >>>>>> diff to a PR and are off topic; when they have to be done in a separate >>>>>> PR they don't get done and they don't get reviewed with the same >>>>>> priority (extra sad face) >>>>>> >>>>>> <rant>I know I am in the minority. I tend to have a lot of PRs where >>>>>> there are 2-5 fairly independent commits. It is "to aid code review" but >>>>>> not in the way you might think: The best size for code review is pretty >>>>>> big, compared to the best size for commit. A commit is the unit of >>>>>> roll-forward, roll-back, cherry-pick, etc. Brian's point about commits >>>>>> not being independently tested is important: this is a tooling issue, >>>>>> but not that easy to change. Here is why I am not that worried about it: >>>>>> I believe strongly in a "rollback first" policy to restore greenness, >>>>>> but also that the rollback change itself must be verified to restore >>>>>> greenness. When a multi-commit PR fails, you can easily open a revert of >>>>>> the whole PR as well as reverts of individual suspect commits. The CI >>>>>> for these will finish around the same time, and if you manage a smaller >>>>>> revert, great! Imagine if to revert a PR you had to revert _every_ >>>>>> change between HEAD and that PR. It would restore to a known green >>>>>> state. Yet we don't do this, because we have technology that makes it >>>>>> unnecessary. Ultimately, single large commits with bullet points are >>>>>> just an unstructured version of multi-commit PRs. So I favor the >>>>>> structure. But people seem to be more likely to write good bullet points >>>>>> than to write independent commits. Perhaps because it is easier.</rant> >>>>>> >>>>>> So at this point, I think I am OK with a 1 commit per PR policy. I think >>>>>> the net benefits to our commit history would be good. I have grown tired >>>>>> of repeating the conversation. Rebase-and-squash edits commit ids in >>>>>> ways that confuses tools, so I do not favor this. Tooling that merges >>>>>> one commit at a time (without altering commit id) would also be super >>>>>> cool and not that hard. It would prevent intermediate results from >>>>>> merging, solving both problems. >>>>>> >>>>>> Kenn >>>>>> >>>>>> >>>>>> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette <bhule...@google.com> >>>>>> wrote: >>>>>>> >>>>>>> I'd argue that the history is almost always "most useful" when one PR >>>>>>> == one commit on master. Intermediate commits from a PR may be useful >>>>>>> to aid code review, but they're not verified by presubmits and thus >>>>>>> aren't necessarily independently revertible, so I see little value in >>>>>>> keeping them around on master. In fact if you're breaking up a PR into >>>>>>> multiple commits to aid code review, it's worth considering if they >>>>>>> could/should be separately reviewed and verified PRs. >>>>>>> We could solve the unwanted commit issue if we have a policy to always >>>>>>> "Squash and Merge" PRs with rare exceptions. >>>>>>> >>>>>>> I agree jira/PR titles could be better, I'm not sure what we can do >>>>>>> about it aside from reminding committers of this responsibility. >>>>>>> Perhaps the triage process can help catch poorly titled jiras? >>>>>>> >>>>>>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw <rober...@google.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> +1 to better descriptions for JIRA (and PRs). Thanks for bringing this >>>>>>>> up. >>>>>>>> >>>>>>>> For merging unwanted commits, can we automate a simple check (e.g. >>>>>>>> with github actions)? >>>>>>>> >>>>>>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki <suzt...@google.com> wrote: >>>>>>>>> >>>>>>>>> BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide >>>>>>>>> [1], I see I was not following this >>>>>>>>> >>>>>>>>> > The reviewer should give the LGTM and then request that the author >>>>>>>>> > of the pull request rebase, squash, split, etc, the commits, so >>>>>>>>> > that the history is most useful >>>>>>>>> >>>>>>>>> >>>>>>>>> Thank you for the feedback on this matter! (And I don't think we >>>>>>>>> should change the contribution guide) >>>>>>>>> >>>>>>>>> [1] https://beam.apache.org/contribute/committer-guide/ >>>>>>>>> >>>>>>>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía <ieme...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > Hello, >>>>>>>>> > >>>>>>>>> > I have noticed an ongoing pattern of carelessness around issues/PR >>>>>>>>> > titles and >>>>>>>>> > descriptions. It is really painful to see more and more examples >>>>>>>>> > like: >>>>>>>>> > >>>>>>>>> > BEAM-12160 Add TODO for fixing the warning >>>>>>>>> > BEAM-12165 Fix ParquetIO >>>>>>>>> > BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use >>>>>>>>> > toMinutes (commit) >>>>>>>>> > >>>>>>>>> > In all these cases with just a bit of detail in the title it would >>>>>>>>> > be enough to >>>>>>>>> > make other contributors or reviewers life easierm as well as to >>>>>>>>> > have a better >>>>>>>>> > project history. What astonishes me apart of the lack of care is >>>>>>>>> > that some of >>>>>>>>> > those are from Beam commmitters. >>>>>>>>> > >>>>>>>>> > We already have discussed about not paying attention during commit >>>>>>>>> > merges where >>>>>>>>> > some PRs end up merging tons of 'unwanted' fixup commits, and >>>>>>>>> > nothing has >>>>>>>>> > changed so I am wondering if we should maybe just totally remove >>>>>>>>> > that rule (for >>>>>>>>> > commits) and also eventually for titles and descriptions. >>>>>>>>> > >>>>>>>>> > Ismaël >>>>>>>>> > >>>>>>>>> > [1] https://beam.apache.org/contribute/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> Tomo >>>>> >>>>>