I was not referring to author identity but to committer identity that
matters to know who accepted to merge something but it seems we are
not really using this much because github is the 'committer' of merge
commits too :S maybe something we can improve as part of this
discussion.

git show --pretty=full COMMITID

On Thu, Apr 22, 2021 at 9:10 PM Valentyn Tymofieiev <[email protected]> wrote:
>
> Author identity is preserved. Here's an output of 'git log'
>
> commit 93ecc1d3a4b997b2490c4439972ffaf09125299f
> Merge: 2e9ee8c005 4e3decbb4e                                                  
>                 <------ a merge commit that merges 2 commit, 4e3decbb4e and 
> it's parent. Author history is preserved on 4e3decbb4e
> Author: Ismaël Mejía <[email protected]>                                      
>            <------  this is the author of merge commit
> Date:   Thu Apr 22 12:46:38 2021 +0200
>
>     Merge pull request #14616: [BEAM-12207] Remove log messages about files 
> to stage.    <------ Note that message was edited, and does not include a 
> branch, which is nice!
> commit 2e9ee8c0052d96045588e617c9e5de017f30454a
>
>
> commit 28020effca12a18a65799ac7d2d3d520d73072d7
> Author: yoshiki.obata <[email protected]>
> Date:   Thu Apr 22 11:57:45 2021 +0900
>
>     [BEAM-7372] cleanup codes for py2 from apache_beam/transforms (#14544)    
>  <--- 1-commit PR  was squashed-and-merged by me. Author's identity is 
> preserved
>
> On Thu, Apr 22, 2021 at 11:47 AM Ismaël Mejía <[email protected]> wrote:
>>
>> In the past github squash and merge did not preserve the committer
>> identity correctly, is it still the case? If  so we should not be
>> using it.
>> https://github.com/isaacs/github/issues/1368
>>
>> On Thu, Apr 22, 2021 at 8:41 PM Robert Bradshaw <[email protected]> wrote:
>> >
>> > On Thu, Apr 22, 2021 at 11:29 AM Valentyn Tymofieiev <[email protected]> 
>> > wrote:
>> >>
>> >> I always squash-and-merge even when there is only 1 commit. This avoids 
>> >> the necessity to edit the commit message to remove not so helpful "Merge 
>> >> pull request xxx" message. Is there any harm to recommend squash by 
>> >> default in the upcoming squash bot even for single commit PRs?
>> >
>> >
>> > Does squash-and-merge in that case preserve the commit as-is if there's 
>> > only one? In that case, there'd be no issues of history. (I opted to not 
>> > comment on 1-commit PRs to be less chatty.)
>> >
>> >>
>> >>
>> >> On Thu, Apr 22, 2021 at 11:19 AM Robert Bradshaw <[email protected]> 
>> >> wrote:
>> >>>
>> >>> On Thu, Apr 22, 2021 at 9:33 AM Kenneth Knowles <[email protected]> wrote:
>> >>>>
>> >>>>
>> >>>> On Thu, Apr 22, 2021 at 7:04 AM Alexey Romanenko 
>> >>>> <[email protected]> wrote:
>> >>>>>
>> >>>>> Thanks Ismael for bringing this on the table again. Kind of my 
>> >>>>> “favourite” topic, unfortunately, that I raised a couple of times… Let 
>> >>>>> me share some of my thoughts on this.
>> >>>>>
>> >>>>> First of all, as Beam developers, honestly we have to agree if we care 
>> >>>>> about our commits history or not. If not (or not so much) then 
>> >>>>> probably there is no more things to discuss and we use Git as just 
>> >>>>> Git… It’s not a bad thing, it’s just different but for large projects, 
>> >>>>> like Beam, clear commits history is ultra important, imho.
>> >>>>>
>> >>>>> Well, for now we do care and we clearly mention this in our 
>> >>>>> Contribution Guide. Probably, it sounds only as a recommendation there 
>> >>>>> or not all contributors (especially first-time ones) read this or take 
>> >>>>> this into account or pay attention on this. It’s fine and we always 
>> >>>>> can expect not following our guide because of many different reasons. 
>> >>>>> And this is exactly where Committers have to play their role! I mean 
>> >>>>> that our clear Git history mostly relies on committer's shoulders and, 
>> >>>>> before clicking on Merge button, every committer have (even “must" I’d 
>> >>>>> say) make sure that PR respects all our rules (we have them because of 
>> >>>>> some reasons, right?) and ready to be merged. Nice and correct 
>> >>>>> titles/messages is one this thing. Personally, the first thing that I 
>> >>>>> do once I start to do a review and before merge, is checking the PR’s 
>> >>>>> title, branches (if it’s from a feature branch and against main Beam 
>> >>>>> branch), number of commits and their messages. Then I take a look on 
>> >>>>> related Jira issue which ID should be prefixed to PR's title and 
>> >>>>> commit’s message(s).
>> >>>>>
>> >>>>> For sure, there are always exceptions. In case of emergency, for 
>> >>>>> example, if the build is broken because of tiny thing then it makes 
>> >>>>> sense to fix this as fast as possible and then, probably, to neglect 
>> >>>>> some rules. But if exceptions become the common practice and happen 
>> >>>>> quite often, then it’s a signal that either we have to change the 
>> >>>>> rules or change our attitude to this.
>> >>>>>
>> >>>>> As I see, the initial Ismael’s message of his email was more about 
>> >>>>> titles and multiple commits per PR is another but, of course, related 
>> >>>>> topic. For both, I believe we can partly automate it - add checks to 
>> >>>>> prevent merging the commits with not correct messages or/and limit 
>> >>>>> number of commits per PR, for example. Some other big projects, like 
>> >>>>> Apache Spark, have even special tool to merge PR in well-formed way 
>> >>>>> [1]. I’m not sure that we need to have something similar because I’m 
>> >>>>> pretty sure it will affect the performance of adding new 
>> >>>>> fixes/features (at least in the beginning), but since we already 
>> >>>>> started the similar discussions several times on regular bases, we 
>> >>>>> might want to think in this way as an option too.
>> >>>>
>> >>>>
>> >>>> Noting that we had one too [1]. The trouble was that the bot had a lot 
>> >>>> of downtime, code was not part of Beam's repo, and also did not encode 
>> >>>> best practices (for example it broke the connection between PRs and 
>> >>>> master branch history because it just cherry-picked and squashed 
>> >>>> commits and pushed those new unrelated commits straight to master). A 
>> >>>> script would address much of this.
>> >>>
>> >>>
>> >>> Yeah, the mergebot was much more hassle than it was worth, and lots 
>> >>> harder to use than pushing a button. I wouldn't be opposed to trying 
>> >>> again with a better (simpler, under our control) one (and in my 
>> >>> investigations of github actions, it doesn't look that hard).
>> >>>
>> >>>> Heuristic CI that says "this commit history looks OK" might solve a lot 
>> >>>> of the problem (I see that Robert already started on this).
>> >>>>
>> >>>> And finally I was to repeat my agreement with Ismaël and Alexey that 
>> >>>> the root problem is this: we need to actually care about the commit 
>> >>>> history and communication of PR/commit titles and descriptions. We use 
>> >>>> tools to help us to implement our intentions and to communicate them to 
>> >>>> newcomers, but I don't think this will replace taking care of the repo.
>> >>>
>> >>>
>> >>> Committers should care about taking care of the repo more than the 
>> >>> average contributor, but even there there is high variance. I think the 
>> >>> issue is "oh, I didn't think to squash vs. merge" rather than "who 
>> >>> cares, I always press merge anyway" in which case a timely reminder will 
>> >>> go a long way.
>> >>>
>> >>>> Kenn
>> >>>>
>> >>>> [1] 
>> >>>> https://lists.apache.org/thread.html/4a65fb0b66935c9dc61568a3067538775edc3e685c6ac03dd3fa4725%40%3Cdev.beam.apache.org%3E
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>> As for me, I’d prefer that every committer paid more attention (if not 
>> >>>>> yet) on these “non code” things before reviewing/merging a PR.
>> >>>>>
>> >>>>> [1] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
>> >>>>>
>> >>>>>
>> >>>>> On 22 Apr 2021, at 01:28, Robert Bradshaw <[email protected]> wrote:
>> >>>>>
>> >>>>> I am also in the camp that it often makes sense to have more than 1 
>> >>>>> commit per PR, but rather than enforce a 1 commit per PR policy, I 
>> >>>>> would say that it's too much bother to look at the commit history 
>> >>>>> whether it should be squashed or merged (though I think it is almost 
>> >>>>> always very obvious which is preferable for a given PR), go ahead and 
>> >>>>> squash rather than merge by default.
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles <[email protected]> 
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> This seems to come up a lot. Maybe we should change something?
>> >>>>>>
>> >>>>>> Having worked on a number of projects and at companies with this 
>> >>>>>> policy, companies using non-distributed source control, and companies 
>> >>>>>> that just "use git like git", I know all these ways of life pretty 
>> >>>>>> well.
>> >>>>>>
>> >>>>>> TL;DR my experience is:
>> >>>>>>  - when people care about the commit history and take care of it, 
>> >>>>>> then just "use git like git" results in faster development and 
>> >>>>>> clearer history, despite intermediate commits not being tested by 
>> >>>>>> Jenkins/Travis/GHA
>> >>>>>>  - when people see git as an inconvenience, view the history as an 
>> >>>>>> implementation detail, or think in linear history of PR merges and 
>> >>>>>> view the commits as an implementation detail, it becomes a mess
>> >>>>>>
>> >>>>>> Empirically, this is what I expect from a 1 commit = 1 PR policy (and 
>> >>>>>> how I feel about each point):
>> >>>>>>  - fewer commits with bad messages (yay!)
>> >>>>>>  - simpler git graph if we squash + rebase (meh)
>> >>>>>>  - larger commits of related-but-independent changes (could be OK)
>> >>>>>>  - commits with bullet points in their description that bundle 
>> >>>>>> unrelated changes (sad face)
>> >>>>>>  - slowdown of development (neutral - slow can be good)
>> >>>>>>  - fewer "quality of life" improvements, since those would add lines 
>> >>>>>> of diff to a PR and are off topic; when they have to be done in a 
>> >>>>>> separate PR they don't get done and they don't get reviewed with the 
>> >>>>>> same priority (extra sad face)
>> >>>>>>
>> >>>>>> <rant>I know I am in the minority. I tend to have a lot of PRs where 
>> >>>>>> there are 2-5 fairly independent commits. It is "to aid code review" 
>> >>>>>> but not in the way you might think: The best size for code review is 
>> >>>>>> pretty big, compared to the best size for commit. A commit is the 
>> >>>>>> unit of roll-forward, roll-back, cherry-pick, etc. Brian's point 
>> >>>>>> about commits not being independently tested is important: this is a 
>> >>>>>> tooling issue, but not that easy to change. Here is why I am not that 
>> >>>>>> worried about it: I believe strongly in a "rollback first" policy to 
>> >>>>>> restore greenness, but also that the rollback change itself must be 
>> >>>>>> verified to restore greenness. When a multi-commit PR fails, you can 
>> >>>>>> easily open a revert of the whole PR as well as reverts of individual 
>> >>>>>> suspect commits. The CI for these will finish around the same time, 
>> >>>>>> and if you manage a smaller revert, great! Imagine if to revert a PR 
>> >>>>>> you had to revert _every_ change between HEAD and that PR. It would 
>> >>>>>> restore to a known green state. Yet we don't do this, because we have 
>> >>>>>> technology that makes it unnecessary. Ultimately, single large 
>> >>>>>> commits with bullet points are just an unstructured version of 
>> >>>>>> multi-commit PRs. So I favor the structure. But people seem to be 
>> >>>>>> more likely to write good bullet points than to write independent 
>> >>>>>> commits. Perhaps because it is easier.</rant>
>> >>>>>>
>> >>>>>> So at this point, I think I am OK with a 1 commit per PR policy. I 
>> >>>>>> think the net benefits to our commit history would be good. I have 
>> >>>>>> grown tired of repeating the conversation. Rebase-and-squash edits 
>> >>>>>> commit ids in ways that confuses tools, so I do not favor this. 
>> >>>>>> Tooling that merges one commit at a time (without altering commit id) 
>> >>>>>> would also be super cool and not that hard. It would prevent 
>> >>>>>> intermediate results from merging, solving both problems.
>> >>>>>>
>> >>>>>> Kenn
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette <[email protected]> 
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> I'd argue that the history is almost always "most useful" when one 
>> >>>>>>> PR == one commit on master. Intermediate commits from a PR may be 
>> >>>>>>> useful to aid code review, but they're not verified by presubmits 
>> >>>>>>> and thus aren't necessarily independently revertible, so I see 
>> >>>>>>> little value in keeping them around on master. In fact if you're 
>> >>>>>>> breaking up a PR into multiple commits to aid code review, it's 
>> >>>>>>> worth considering if they could/should be separately reviewed and 
>> >>>>>>> verified PRs.
>> >>>>>>> We could solve the unwanted commit issue if we have a policy to 
>> >>>>>>> always "Squash and Merge" PRs with rare exceptions.
>> >>>>>>>
>> >>>>>>> I agree jira/PR titles could be better, I'm not sure what we can do 
>> >>>>>>> about it aside from reminding committers of this responsibility. 
>> >>>>>>> Perhaps the triage process can help catch poorly titled jiras?
>> >>>>>>>
>> >>>>>>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw 
>> >>>>>>> <[email protected]> wrote:
>> >>>>>>>>
>> >>>>>>>> +1 to better descriptions for JIRA (and PRs). Thanks for bringing 
>> >>>>>>>> this up.
>> >>>>>>>>
>> >>>>>>>> For merging unwanted commits, can we automate a simple check (e.g. 
>> >>>>>>>> with github actions)?
>> >>>>>>>>
>> >>>>>>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki <[email protected]> 
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> BEAM-12173 is on me. I'm sorry about that. Re-reading committer 
>> >>>>>>>>> guide
>> >>>>>>>>> [1], I see I was not following this
>> >>>>>>>>>
>> >>>>>>>>> > The reviewer should give the LGTM and then request that the 
>> >>>>>>>>> > author of the pull request rebase, squash, split, etc, the 
>> >>>>>>>>> > commits, so that the history is most useful
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Thank you for the feedback on this matter! (And I don't think we
>> >>>>>>>>> should change the contribution guide)
>> >>>>>>>>>
>> >>>>>>>>> [1] https://beam.apache.org/contribute/committer-guide/
>> >>>>>>>>>
>> >>>>>>>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía <[email protected]> 
>> >>>>>>>>> wrote:
>> >>>>>>>>> >
>> >>>>>>>>> > Hello,
>> >>>>>>>>> >
>> >>>>>>>>> > I have noticed an ongoing pattern of carelessness around 
>> >>>>>>>>> > issues/PR titles and
>> >>>>>>>>> > descriptions. It is really painful to see more and more examples 
>> >>>>>>>>> > like:
>> >>>>>>>>> >
>> >>>>>>>>> > BEAM-12160 Add TODO for fixing the warning
>> >>>>>>>>> > BEAM-12165 Fix ParquetIO
>> >>>>>>>>> > BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
>> >>>>>>>>> > toMinutes (commit)
>> >>>>>>>>> >
>> >>>>>>>>> > In all these cases with just a bit of detail in the title it 
>> >>>>>>>>> > would be enough to
>> >>>>>>>>> > make other contributors or reviewers life easierm as well as to 
>> >>>>>>>>> > have a better
>> >>>>>>>>> > project history.  What astonishes me apart of the lack of care 
>> >>>>>>>>> > is that some of
>> >>>>>>>>> > those are from Beam commmitters.
>> >>>>>>>>> >
>> >>>>>>>>> > We already have discussed about not paying attention during 
>> >>>>>>>>> > commit merges where
>> >>>>>>>>> > some PRs end up merging tons of 'unwanted' fixup commits, and 
>> >>>>>>>>> > nothing has
>> >>>>>>>>> > changed so I am wondering if we should maybe just totally remove 
>> >>>>>>>>> > that rule (for
>> >>>>>>>>> > commits) and also eventually for titles and descriptions.
>> >>>>>>>>> >
>> >>>>>>>>> > Ismaël
>> >>>>>>>>> >
>> >>>>>>>>> > [1] https://beam.apache.org/contribute/
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Regards,
>> >>>>>>>>> Tomo
>> >>>>>
>> >>>>>

Reply via email to