In the past github squash and merge did not preserve the committer
identity correctly, is it still the case? If  so we should not be
using it.
https://github.com/isaacs/github/issues/1368

On Thu, Apr 22, 2021 at 8:41 PM Robert Bradshaw <rober...@google.com> wrote:
>
> On Thu, Apr 22, 2021 at 11:29 AM Valentyn Tymofieiev <valen...@google.com> 
> wrote:
>>
>> I always squash-and-merge even when there is only 1 commit. This avoids the 
>> necessity to edit the commit message to remove not so helpful "Merge pull 
>> request xxx" message. Is there any harm to recommend squash by default in 
>> the upcoming squash bot even for single commit PRs?
>
>
> Does squash-and-merge in that case preserve the commit as-is if there's only 
> one? In that case, there'd be no issues of history. (I opted to not comment 
> on 1-commit PRs to be less chatty.)
>
>>
>>
>> On Thu, Apr 22, 2021 at 11:19 AM Robert Bradshaw <rober...@google.com> wrote:
>>>
>>> On Thu, Apr 22, 2021 at 9:33 AM Kenneth Knowles <k...@apache.org> wrote:
>>>>
>>>>
>>>> On Thu, Apr 22, 2021 at 7:04 AM Alexey Romanenko 
>>>> <aromanenko....@gmail.com> wrote:
>>>>>
>>>>> Thanks Ismael for bringing this on the table again. Kind of my 
>>>>> “favourite” topic, unfortunately, that I raised a couple of times… Let me 
>>>>> share some of my thoughts on this.
>>>>>
>>>>> First of all, as Beam developers, honestly we have to agree if we care 
>>>>> about our commits history or not. If not (or not so much) then probably 
>>>>> there is no more things to discuss and we use Git as just Git… It’s not a 
>>>>> bad thing, it’s just different but for large projects, like Beam, clear 
>>>>> commits history is ultra important, imho.
>>>>>
>>>>> Well, for now we do care and we clearly mention this in our Contribution 
>>>>> Guide. Probably, it sounds only as a recommendation there or not all 
>>>>> contributors (especially first-time ones) read this or take this into 
>>>>> account or pay attention on this. It’s fine and we always can expect not 
>>>>> following our guide because of many different reasons. And this is 
>>>>> exactly where Committers have to play their role! I mean that our clear 
>>>>> Git history mostly relies on committer's shoulders and, before clicking 
>>>>> on Merge button, every committer have (even “must" I’d say) make sure 
>>>>> that PR respects all our rules (we have them because of some reasons, 
>>>>> right?) and ready to be merged. Nice and correct titles/messages is one 
>>>>> this thing. Personally, the first thing that I do once I start to do a 
>>>>> review and before merge, is checking the PR’s title, branches (if it’s 
>>>>> from a feature branch and against main Beam branch), number of commits 
>>>>> and their messages. Then I take a look on related Jira issue which ID 
>>>>> should be prefixed to PR's title and commit’s message(s).
>>>>>
>>>>> For sure, there are always exceptions. In case of emergency, for example, 
>>>>> if the build is broken because of tiny thing then it makes sense to fix 
>>>>> this as fast as possible and then, probably, to neglect some rules. But 
>>>>> if exceptions become the common practice and happen quite often, then 
>>>>> it’s a signal that either we have to change the rules or change our 
>>>>> attitude to this.
>>>>>
>>>>> As I see, the initial Ismael’s message of his email was more about titles 
>>>>> and multiple commits per PR is another but, of course, related topic. For 
>>>>> both, I believe we can partly automate it - add checks to prevent merging 
>>>>> the commits with not correct messages or/and limit number of commits per 
>>>>> PR, for example. Some other big projects, like Apache Spark, have even 
>>>>> special tool to merge PR in well-formed way [1]. I’m not sure that we 
>>>>> need to have something similar because I’m pretty sure it will affect the 
>>>>> performance of adding new fixes/features (at least in the beginning), but 
>>>>> since we already started the similar discussions several times on regular 
>>>>> bases, we might want to think in this way as an option too.
>>>>
>>>>
>>>> Noting that we had one too [1]. The trouble was that the bot had a lot of 
>>>> downtime, code was not part of Beam's repo, and also did not encode best 
>>>> practices (for example it broke the connection between PRs and master 
>>>> branch history because it just cherry-picked and squashed commits and 
>>>> pushed those new unrelated commits straight to master). A script would 
>>>> address much of this.
>>>
>>>
>>> Yeah, the mergebot was much more hassle than it was worth, and lots harder 
>>> to use than pushing a button. I wouldn't be opposed to trying again with a 
>>> better (simpler, under our control) one (and in my investigations of github 
>>> actions, it doesn't look that hard).
>>>
>>>> Heuristic CI that says "this commit history looks OK" might solve a lot of 
>>>> the problem (I see that Robert already started on this).
>>>>
>>>> And finally I was to repeat my agreement with Ismaël and Alexey that the 
>>>> root problem is this: we need to actually care about the commit history 
>>>> and communication of PR/commit titles and descriptions. We use tools to 
>>>> help us to implement our intentions and to communicate them to newcomers, 
>>>> but I don't think this will replace taking care of the repo.
>>>
>>>
>>> Committers should care about taking care of the repo more than the average 
>>> contributor, but even there there is high variance. I think the issue is 
>>> "oh, I didn't think to squash vs. merge" rather than "who cares, I always 
>>> press merge anyway" in which case a timely reminder will go a long way.
>>>
>>>> Kenn
>>>>
>>>> [1] 
>>>> https://lists.apache.org/thread.html/4a65fb0b66935c9dc61568a3067538775edc3e685c6ac03dd3fa4725%40%3Cdev.beam.apache.org%3E
>>>>
>>>>>
>>>>>
>>>>> As for me, I’d prefer that every committer paid more attention (if not 
>>>>> yet) on these “non code” things before reviewing/merging a PR.
>>>>>
>>>>> [1] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
>>>>>
>>>>>
>>>>> On 22 Apr 2021, at 01:28, Robert Bradshaw <rober...@google.com> wrote:
>>>>>
>>>>> I am also in the camp that it often makes sense to have more than 1 
>>>>> commit per PR, but rather than enforce a 1 commit per PR policy, I would 
>>>>> say that it's too much bother to look at the commit history whether it 
>>>>> should be squashed or merged (though I think it is almost always very 
>>>>> obvious which is preferable for a given PR), go ahead and squash rather 
>>>>> than merge by default.
>>>>>
>>>>>
>>>>> On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles <k...@apache.org> wrote:
>>>>>>
>>>>>> This seems to come up a lot. Maybe we should change something?
>>>>>>
>>>>>> Having worked on a number of projects and at companies with this policy, 
>>>>>> companies using non-distributed source control, and companies that just 
>>>>>> "use git like git", I know all these ways of life pretty well.
>>>>>>
>>>>>> TL;DR my experience is:
>>>>>>  - when people care about the commit history and take care of it, then 
>>>>>> just "use git like git" results in faster development and clearer 
>>>>>> history, despite intermediate commits not being tested by 
>>>>>> Jenkins/Travis/GHA
>>>>>>  - when people see git as an inconvenience, view the history as an 
>>>>>> implementation detail, or think in linear history of PR merges and view 
>>>>>> the commits as an implementation detail, it becomes a mess
>>>>>>
>>>>>> Empirically, this is what I expect from a 1 commit = 1 PR policy (and 
>>>>>> how I feel about each point):
>>>>>>  - fewer commits with bad messages (yay!)
>>>>>>  - simpler git graph if we squash + rebase (meh)
>>>>>>  - larger commits of related-but-independent changes (could be OK)
>>>>>>  - commits with bullet points in their description that bundle unrelated 
>>>>>> changes (sad face)
>>>>>>  - slowdown of development (neutral - slow can be good)
>>>>>>  - fewer "quality of life" improvements, since those would add lines of 
>>>>>> diff to a PR and are off topic; when they have to be done in a separate 
>>>>>> PR they don't get done and they don't get reviewed with the same 
>>>>>> priority (extra sad face)
>>>>>>
>>>>>> <rant>I know I am in the minority. I tend to have a lot of PRs where 
>>>>>> there are 2-5 fairly independent commits. It is "to aid code review" but 
>>>>>> not in the way you might think: The best size for code review is pretty 
>>>>>> big, compared to the best size for commit. A commit is the unit of 
>>>>>> roll-forward, roll-back, cherry-pick, etc. Brian's point about commits 
>>>>>> not being independently tested is important: this is a tooling issue, 
>>>>>> but not that easy to change. Here is why I am not that worried about it: 
>>>>>> I believe strongly in a "rollback first" policy to restore greenness, 
>>>>>> but also that the rollback change itself must be verified to restore 
>>>>>> greenness. When a multi-commit PR fails, you can easily open a revert of 
>>>>>> the whole PR as well as reverts of individual suspect commits. The CI 
>>>>>> for these will finish around the same time, and if you manage a smaller 
>>>>>> revert, great! Imagine if to revert a PR you had to revert _every_ 
>>>>>> change between HEAD and that PR. It would restore to a known green 
>>>>>> state. Yet we don't do this, because we have technology that makes it 
>>>>>> unnecessary. Ultimately, single large commits with bullet points are 
>>>>>> just an unstructured version of multi-commit PRs. So I favor the 
>>>>>> structure. But people seem to be more likely to write good bullet points 
>>>>>> than to write independent commits. Perhaps because it is easier.</rant>
>>>>>>
>>>>>> So at this point, I think I am OK with a 1 commit per PR policy. I think 
>>>>>> the net benefits to our commit history would be good. I have grown tired 
>>>>>> of repeating the conversation. Rebase-and-squash edits commit ids in 
>>>>>> ways that confuses tools, so I do not favor this. Tooling that merges 
>>>>>> one commit at a time (without altering commit id) would also be super 
>>>>>> cool and not that hard. It would prevent intermediate results from 
>>>>>> merging, solving both problems.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette <bhule...@google.com> 
>>>>>> wrote:
>>>>>>>
>>>>>>> I'd argue that the history is almost always "most useful" when one PR 
>>>>>>> == one commit on master. Intermediate commits from a PR may be useful 
>>>>>>> to aid code review, but they're not verified by presubmits and thus 
>>>>>>> aren't necessarily independently revertible, so I see little value in 
>>>>>>> keeping them around on master. In fact if you're breaking up a PR into 
>>>>>>> multiple commits to aid code review, it's worth considering if they 
>>>>>>> could/should be separately reviewed and verified PRs.
>>>>>>> We could solve the unwanted commit issue if we have a policy to always 
>>>>>>> "Squash and Merge" PRs with rare exceptions.
>>>>>>>
>>>>>>> I agree jira/PR titles could be better, I'm not sure what we can do 
>>>>>>> about it aside from reminding committers of this responsibility. 
>>>>>>> Perhaps the triage process can help catch poorly titled jiras?
>>>>>>>
>>>>>>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw <rober...@google.com> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> +1 to better descriptions for JIRA (and PRs). Thanks for bringing this 
>>>>>>>> up.
>>>>>>>>
>>>>>>>> For merging unwanted commits, can we automate a simple check (e.g. 
>>>>>>>> with github actions)?
>>>>>>>>
>>>>>>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki <suzt...@google.com> wrote:
>>>>>>>>>
>>>>>>>>> BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
>>>>>>>>> [1], I see I was not following this
>>>>>>>>>
>>>>>>>>> > The reviewer should give the LGTM and then request that the author 
>>>>>>>>> > of the pull request rebase, squash, split, etc, the commits, so 
>>>>>>>>> > that the history is most useful
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you for the feedback on this matter! (And I don't think we
>>>>>>>>> should change the contribution guide)
>>>>>>>>>
>>>>>>>>> [1] https://beam.apache.org/contribute/committer-guide/
>>>>>>>>>
>>>>>>>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía <ieme...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Hello,
>>>>>>>>> >
>>>>>>>>> > I have noticed an ongoing pattern of carelessness around issues/PR 
>>>>>>>>> > titles and
>>>>>>>>> > descriptions. It is really painful to see more and more examples 
>>>>>>>>> > like:
>>>>>>>>> >
>>>>>>>>> > BEAM-12160 Add TODO for fixing the warning
>>>>>>>>> > BEAM-12165 Fix ParquetIO
>>>>>>>>> > BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
>>>>>>>>> > toMinutes (commit)
>>>>>>>>> >
>>>>>>>>> > In all these cases with just a bit of detail in the title it would 
>>>>>>>>> > be enough to
>>>>>>>>> > make other contributors or reviewers life easierm as well as to 
>>>>>>>>> > have a better
>>>>>>>>> > project history.  What astonishes me apart of the lack of care is 
>>>>>>>>> > that some of
>>>>>>>>> > those are from Beam commmitters.
>>>>>>>>> >
>>>>>>>>> > We already have discussed about not paying attention during commit 
>>>>>>>>> > merges where
>>>>>>>>> > some PRs end up merging tons of 'unwanted' fixup commits, and 
>>>>>>>>> > nothing has
>>>>>>>>> > changed so I am wondering if we should maybe just totally remove 
>>>>>>>>> > that rule (for
>>>>>>>>> > commits) and also eventually for titles and descriptions.
>>>>>>>>> >
>>>>>>>>> > Ismaël
>>>>>>>>> >
>>>>>>>>> > [1] https://beam.apache.org/contribute/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Tomo
>>>>>
>>>>>

Reply via email to