I think I am wrong about this. It seems like for squashed/rebased commits it is still GitHub that is committer? But it does seem to have the metadata about who did the squash & merge. This pattern of storing important metadata outside of git is not a good direction.
Kenn On Thu, Apr 22, 2021 at 12:45 PM Kenneth Knowles <[email protected]> wrote: > That is unfortunate that GitHub is the committer of merge commits :-/ > though I suppose you have the author field you can use. It is unfortunate > the this is a different field based on method. > > Kenn > > On Thu, Apr 22, 2021 at 12:39 PM Ismaël Mejía <[email protected]> wrote: > >> I was not referring to author identity but to committer identity that >> matters to know who accepted to merge something but it seems we are >> not really using this much because github is the 'committer' of merge >> commits too :S maybe something we can improve as part of this >> discussion. >> >> git show --pretty=full COMMITID >> >> On Thu, Apr 22, 2021 at 9:10 PM Valentyn Tymofieiev <[email protected]> >> wrote: >> > >> > Author identity is preserved. Here's an output of 'git log' >> > >> > commit 93ecc1d3a4b997b2490c4439972ffaf09125299f >> > Merge: 2e9ee8c005 4e3decbb4e >> <------ a merge commit that merges 2 commit, >> 4e3decbb4e and it's parent. Author history is preserved on 4e3decbb4e >> > Author: Ismaël Mejía <[email protected]> >> <------ this is the author of merge commit >> > Date: Thu Apr 22 12:46:38 2021 +0200 >> > >> > Merge pull request #14616: [BEAM-12207] Remove log messages about >> files to stage. <------ Note that message was edited, and does not >> include a branch, which is nice! >> > commit 2e9ee8c0052d96045588e617c9e5de017f30454a >> > >> > >> > commit 28020effca12a18a65799ac7d2d3d520d73072d7 >> > Author: yoshiki.obata <[email protected]> >> > Date: Thu Apr 22 11:57:45 2021 +0900 >> > >> > [BEAM-7372] cleanup codes for py2 from apache_beam/transforms >> (#14544) <--- 1-commit PR was squashed-and-merged by me. Author's >> identity is preserved >> > >> > On Thu, Apr 22, 2021 at 11:47 AM Ismaël Mejía <[email protected]> >> wrote: >> >> >> >> In the past github squash and merge did not preserve the committer >> >> identity correctly, is it still the case? If so we should not be >> >> using it. >> >> https://github.com/isaacs/github/issues/1368 >> >> >> >> On Thu, Apr 22, 2021 at 8:41 PM Robert Bradshaw <[email protected]> >> wrote: >> >> > >> >> > On Thu, Apr 22, 2021 at 11:29 AM Valentyn Tymofieiev < >> [email protected]> wrote: >> >> >> >> >> >> I always squash-and-merge even when there is only 1 commit. This >> avoids the necessity to edit the commit message to remove not so helpful >> "Merge pull request xxx" message. Is there any harm to recommend squash by >> default in the upcoming squash bot even for single commit PRs? >> >> > >> >> > >> >> > Does squash-and-merge in that case preserve the commit as-is if >> there's only one? In that case, there'd be no issues of history. (I opted >> to not comment on 1-commit PRs to be less chatty.) >> >> > >> >> >> >> >> >> >> >> >> On Thu, Apr 22, 2021 at 11:19 AM Robert Bradshaw < >> [email protected]> wrote: >> >> >>> >> >> >>> On Thu, Apr 22, 2021 at 9:33 AM Kenneth Knowles <[email protected]> >> wrote: >> >> >>>> >> >> >>>> >> >> >>>> On Thu, Apr 22, 2021 at 7:04 AM Alexey Romanenko < >> [email protected]> wrote: >> >> >>>>> >> >> >>>>> Thanks Ismael for bringing this on the table again. Kind of my >> “favourite” topic, unfortunately, that I raised a couple of times… Let me >> share some of my thoughts on this. >> >> >>>>> >> >> >>>>> First of all, as Beam developers, honestly we have to agree if >> we care about our commits history or not. If not (or not so much) then >> probably there is no more things to discuss and we use Git as just Git… >> It’s not a bad thing, it’s just different but for large projects, like >> Beam, clear commits history is ultra important, imho. >> >> >>>>> >> >> >>>>> Well, for now we do care and we clearly mention this in our >> Contribution Guide. Probably, it sounds only as a recommendation there or >> not all contributors (especially first-time ones) read this or take this >> into account or pay attention on this. It’s fine and we always can expect >> not following our guide because of many different reasons. And this is >> exactly where Committers have to play their role! I mean that our clear Git >> history mostly relies on committer's shoulders and, before clicking on >> Merge button, every committer have (even “must" I’d say) make sure that PR >> respects all our rules (we have them because of some reasons, right?) and >> ready to be merged. Nice and correct titles/messages is one this thing. >> Personally, the first thing that I do once I start to do a review and >> before merge, is checking the PR’s title, branches (if it’s from a feature >> branch and against main Beam branch), number of commits and their messages. >> Then I take a look on related Jira issue which ID should be prefixed to >> PR's title and commit’s message(s). >> >> >>>>> >> >> >>>>> For sure, there are always exceptions. In case of emergency, for >> example, if the build is broken because of tiny thing then it makes sense >> to fix this as fast as possible and then, probably, to neglect some rules. >> But if exceptions become the common practice and happen quite often, then >> it’s a signal that either we have to change the rules or change our >> attitude to this. >> >> >>>>> >> >> >>>>> As I see, the initial Ismael’s message of his email was more >> about titles and multiple commits per PR is another but, of course, related >> topic. For both, I believe we can partly automate it - add checks to >> prevent merging the commits with not correct messages or/and limit number >> of commits per PR, for example. Some other big projects, like Apache Spark, >> have even special tool to merge PR in well-formed way [1]. I’m not sure >> that we need to have something similar because I’m pretty sure it will >> affect the performance of adding new fixes/features (at least in the >> beginning), but since we already started the similar discussions several >> times on regular bases, we might want to think in this way as an option too. >> >> >>>> >> >> >>>> >> >> >>>> Noting that we had one too [1]. The trouble was that the bot had >> a lot of downtime, code was not part of Beam's repo, and also did not >> encode best practices (for example it broke the connection between PRs and >> master branch history because it just cherry-picked and squashed commits >> and pushed those new unrelated commits straight to master). A script would >> address much of this. >> >> >>> >> >> >>> >> >> >>> Yeah, the mergebot was much more hassle than it was worth, and >> lots harder to use than pushing a button. I wouldn't be opposed to trying >> again with a better (simpler, under our control) one (and in my >> investigations of github actions, it doesn't look that hard). >> >> >>> >> >> >>>> Heuristic CI that says "this commit history looks OK" might solve >> a lot of the problem (I see that Robert already started on this). >> >> >>>> >> >> >>>> And finally I was to repeat my agreement with Ismaël and Alexey >> that the root problem is this: we need to actually care about the commit >> history and communication of PR/commit titles and descriptions. We use >> tools to help us to implement our intentions and to communicate them to >> newcomers, but I don't think this will replace taking care of the repo. >> >> >>> >> >> >>> >> >> >>> Committers should care about taking care of the repo more than the >> average contributor, but even there there is high variance. I think the >> issue is "oh, I didn't think to squash vs. merge" rather than "who cares, I >> always press merge anyway" in which case a timely reminder will go a long >> way. >> >> >>> >> >> >>>> Kenn >> >> >>>> >> >> >>>> [1] >> https://lists.apache.org/thread.html/4a65fb0b66935c9dc61568a3067538775edc3e685c6ac03dd3fa4725%40%3Cdev.beam.apache.org%3E >> >> >>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> As for me, I’d prefer that every committer paid more attention >> (if not yet) on these “non code” things before reviewing/merging a PR. >> >> >>>>> >> >> >>>>> [1] >> https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py >> >> >>>>> >> >> >>>>> >> >> >>>>> On 22 Apr 2021, at 01:28, Robert Bradshaw <[email protected]> >> wrote: >> >> >>>>> >> >> >>>>> I am also in the camp that it often makes sense to have more >> than 1 commit per PR, but rather than enforce a 1 commit per PR policy, I >> would say that it's too much bother to look at the commit history whether >> it should be squashed or merged (though I think it is almost always very >> obvious which is preferable for a given PR), go ahead and squash rather >> than merge by default. >> >> >>>>> >> >> >>>>> >> >> >>>>> On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles <[email protected]> >> wrote: >> >> >>>>>> >> >> >>>>>> This seems to come up a lot. Maybe we should change something? >> >> >>>>>> >> >> >>>>>> Having worked on a number of projects and at companies with >> this policy, companies using non-distributed source control, and companies >> that just "use git like git", I know all these ways of life pretty well. >> >> >>>>>> >> >> >>>>>> TL;DR my experience is: >> >> >>>>>> - when people care about the commit history and take care of >> it, then just "use git like git" results in faster development and clearer >> history, despite intermediate commits not being tested by Jenkins/Travis/GHA >> >> >>>>>> - when people see git as an inconvenience, view the history as >> an implementation detail, or think in linear history of PR merges and view >> the commits as an implementation detail, it becomes a mess >> >> >>>>>> >> >> >>>>>> Empirically, this is what I expect from a 1 commit = 1 PR >> policy (and how I feel about each point): >> >> >>>>>> - fewer commits with bad messages (yay!) >> >> >>>>>> - simpler git graph if we squash + rebase (meh) >> >> >>>>>> - larger commits of related-but-independent changes (could be >> OK) >> >> >>>>>> - commits with bullet points in their description that bundle >> unrelated changes (sad face) >> >> >>>>>> - slowdown of development (neutral - slow can be good) >> >> >>>>>> - fewer "quality of life" improvements, since those would add >> lines of diff to a PR and are off topic; when they have to be done in a >> separate PR they don't get done and they don't get reviewed with the same >> priority (extra sad face) >> >> >>>>>> >> >> >>>>>> <rant>I know I am in the minority. I tend to have a lot of PRs >> where there are 2-5 fairly independent commits. It is "to aid code review" >> but not in the way you might think: The best size for code review is pretty >> big, compared to the best size for commit. A commit is the unit of >> roll-forward, roll-back, cherry-pick, etc. Brian's point about commits not >> being independently tested is important: this is a tooling issue, but not >> that easy to change. Here is why I am not that worried about it: I believe >> strongly in a "rollback first" policy to restore greenness, but also that >> the rollback change itself must be verified to restore greenness. When a >> multi-commit PR fails, you can easily open a revert of the whole PR as well >> as reverts of individual suspect commits. The CI for these will finish >> around the same time, and if you manage a smaller revert, great! Imagine if >> to revert a PR you had to revert _every_ change between HEAD and that PR. >> It would restore to a known green state. Yet we don't do this, because we >> have technology that makes it unnecessary. Ultimately, single large commits >> with bullet points are just an unstructured version of multi-commit PRs. So >> I favor the structure. But people seem to be more likely to write good >> bullet points than to write independent commits. Perhaps because it is >> easier.</rant> >> >> >>>>>> >> >> >>>>>> So at this point, I think I am OK with a 1 commit per PR >> policy. I think the net benefits to our commit history would be good. I >> have grown tired of repeating the conversation. Rebase-and-squash edits >> commit ids in ways that confuses tools, so I do not favor this. Tooling >> that merges one commit at a time (without altering commit id) would also be >> super cool and not that hard. It would prevent intermediate results from >> merging, solving both problems. >> >> >>>>>> >> >> >>>>>> Kenn >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette < >> [email protected]> wrote: >> >> >>>>>>> >> >> >>>>>>> I'd argue that the history is almost always "most useful" when >> one PR == one commit on master. Intermediate commits from a PR may be >> useful to aid code review, but they're not verified by presubmits and thus >> aren't necessarily independently revertible, so I see little value in >> keeping them around on master. In fact if you're breaking up a PR into >> multiple commits to aid code review, it's worth considering if they >> could/should be separately reviewed and verified PRs. >> >> >>>>>>> We could solve the unwanted commit issue if we have a policy >> to always "Squash and Merge" PRs with rare exceptions. >> >> >>>>>>> >> >> >>>>>>> I agree jira/PR titles could be better, I'm not sure what we >> can do about it aside from reminding committers of this responsibility. >> Perhaps the triage process can help catch poorly titled jiras? >> >> >>>>>>> >> >> >>>>>>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw < >> [email protected]> wrote: >> >> >>>>>>>> >> >> >>>>>>>> +1 to better descriptions for JIRA (and PRs). Thanks for >> bringing this up. >> >> >>>>>>>> >> >> >>>>>>>> For merging unwanted commits, can we automate a simple check >> (e.g. with github actions)? >> >> >>>>>>>> >> >> >>>>>>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki < >> [email protected]> wrote: >> >> >>>>>>>>> >> >> >>>>>>>>> BEAM-12173 is on me. I'm sorry about that. Re-reading >> committer guide >> >> >>>>>>>>> [1], I see I was not following this >> >> >>>>>>>>> >> >> >>>>>>>>> > The reviewer should give the LGTM and then request that >> the author of the pull request rebase, squash, split, etc, the commits, so >> that the history is most useful >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> Thank you for the feedback on this matter! (And I don't >> think we >> >> >>>>>>>>> should change the contribution guide) >> >> >>>>>>>>> >> >> >>>>>>>>> [1] https://beam.apache.org/contribute/committer-guide/ >> >> >>>>>>>>> >> >> >>>>>>>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía < >> [email protected]> wrote: >> >> >>>>>>>>> > >> >> >>>>>>>>> > Hello, >> >> >>>>>>>>> > >> >> >>>>>>>>> > I have noticed an ongoing pattern of carelessness around >> issues/PR titles and >> >> >>>>>>>>> > descriptions. It is really painful to see more and more >> examples like: >> >> >>>>>>>>> > >> >> >>>>>>>>> > BEAM-12160 Add TODO for fixing the warning >> >> >>>>>>>>> > BEAM-12165 Fix ParquetIO >> >> >>>>>>>>> > BEAM-12173 avoid intermediate conversion (PR) and >> BEAM-12173 use >> >> >>>>>>>>> > toMinutes (commit) >> >> >>>>>>>>> > >> >> >>>>>>>>> > In all these cases with just a bit of detail in the title >> it would be enough to >> >> >>>>>>>>> > make other contributors or reviewers life easierm as well >> as to have a better >> >> >>>>>>>>> > project history. What astonishes me apart of the lack of >> care is that some of >> >> >>>>>>>>> > those are from Beam commmitters. >> >> >>>>>>>>> > >> >> >>>>>>>>> > We already have discussed about not paying attention >> during commit merges where >> >> >>>>>>>>> > some PRs end up merging tons of 'unwanted' fixup commits, >> and nothing has >> >> >>>>>>>>> > changed so I am wondering if we should maybe just totally >> remove that rule (for >> >> >>>>>>>>> > commits) and also eventually for titles and descriptions. >> >> >>>>>>>>> > >> >> >>>>>>>>> > Ismaël >> >> >>>>>>>>> > >> >> >>>>>>>>> > [1] https://beam.apache.org/contribute/ >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> -- >> >> >>>>>>>>> Regards, >> >> >>>>>>>>> Tomo >> >> >>>>> >> >> >>>>> >> >
