Reviving this thread, since I never quite got to my solution. Here's an 
interesting tool I found which attempts to solve this problem:
https://git.sr.ht/~nhaehnle/diff-modulo-base

However it's designed more for rebases, and it doesn't seem to produce the 
correct (where correct === what GitHub would show) output when looking at 
two commits with a merge in between.

- Sam

On Wednesday, 22 February 2023 at 04:58:46 UTC-8 philip...@iee.email wrote:

> further example courtesy of Microsoft ... " Git history simplification can 
> be a confusing beast. "
>  
> https://learn.microsoft.com/en-us/azure/devops/repos/git/git-log-history-simplification?view=azure-devops
>
> (ignore those '&gt' vs '>' html errors in the git code examples ;-) 
> On Wednesday, February 22, 2023 at 12:23:50 PM UTC Philip Oakley wrote:
>
>> I haven't tried to follow that example properly yet.. 
>>
>> However one other thing to look at is the "History Simplification" that 
>> includes parent re-writing that's in the rev-list-options.txt file and then 
>> included in a number of man pages (log, show, short-log,..).  There are 
>> some slippery concepts in there, often context dependent!
>>
>> https://git-scm.com/docs/git-log#_history_simplification
>>
>> On Wednesday, February 22, 2023 at 5:00:51 AM UTC s...@codeapprove.com 
>> wrote:
>>
>>> Thank you both for getting back to me. The discussion in the docs about 
>>> flattening was really interesting!  I should note that the git clone / git 
>>> log command pair I provided gives me almost exactly what I want, but I need 
>>> to combine the diffs. It seems to contain the correct changes, and the 
>>> speed is pretty good too.
>>>
>>> Let me give an example of the situation I am optimizing for. I apologize 
>>> in advance I am going to use GitHub terms which I know are not pure git, 
>>> but in the end my question is a git question. 
>>>
>>> Say you're a developer working in a many-developer repository. Here's 
>>> the sequence:
>>>
>>>    - On Day 0 you check out "main" and create "my-topic-branch". You 
>>>    add commits A, B, C, D to that branch. 
>>>    - Now you open a pull request on GitHub asking to merge your branch 
>>>    "my-topic-branch" into "master". 
>>>    - You see a collaborator has landed a change to "main" since you 
>>>    started. So you do "git fetch origin main && git merge main" and make a 
>>>    merge commit in your branch. 
>>>    - Then you add three more commits E, F, G on top of that and push 
>>>    your branch again. So you have: A, B, C, D, (merge main), E, F, G.
>>>    - A coworker has already looked at commits A, B, C and wants to see 
>>>    what you've done then. So they ask GitHub to show the diff from commits 
>>> D 
>>>    through G (including the merge).
>>>
>>> When you do this, GitHub does something which (to me, anyway) is pretty 
>>> magical. You are shown only the changes that you committed to your branch 
>>> in D, E, F, and G. Changes which you merged in, which may or may not 
>>> involve the files in your Pull Request, are not shown at all since they're 
>>> not "yours".
>>>
>>> Here's a public example showing a team using this pattern. This one has 
>>> multiple merges, so I may need to find a cleaner example but hopefully this 
>>> makes sense.
>>>
>>>    - Consider this PR: 
>>>    https://github.com/firebase/firebase-tools/pull/5478/files
>>>       - This is the full diff (according to GitHub) and we can see 
>>>       exactly one added line in CHANGELOG.md
>>>    - Here's a merge commit: 
>>>    
>>> https://github.com/firebase/firebase-tools/pull/5478/commits/ebce28ceb799f721d36b986705c54cbcd597a27a
>>>       - We can see that on the base branch, "master", a line was added 
>>>       to the *end* of the CHANGELOG.md file. There is no such addition 
>>>       displayed in the full diff.
>>>    - Here's a "magic" diff where I selected three commits (before 
>>>    merge, merge, and after merge): 
>>>    
>>> https://github.com/firebase/firebase-tools/pull/5478/files/28b8a72561b266a2086059c0d9840ab25f03d8ae..b2d89ebd67e3f8c17c4c607c630c18096303096b
>>>       - We can see that the changes from the merge commit are not shown 
>>>       as additions! But they are present as context lines.
>>>    
>>> I need to find a sequence of git commands to produce the same exact diff 
>>> that GitHub produces (and ideally do it very quickly even in a large 
>>> repository) and I just can't figure it out.
>>>
>>> Thanks,
>>> Sam
>>>
>>>    
>>>
>>>
>>> On Tuesday, 21 February 2023 at 14:12:48 UTC-8 philip...@iee.email wrote:
>>>
>>>> This may also be an issue of the History Simplification process and / 
>>>> or the 'flattening' processes for history linearisation and rebases.
>>>>
>>>> The flattening is a known phenomena and was currently being mentioned 
>>>> on the Git List, so I have noted this there. 
>>>> [1] https://lore.kernel.org/git/a856dd16-9876-509b...@iee.email/ 
>>>> <https://lore.kernel.org/git/a856dd16-9876-509b-6a99-11ea0020633c@iee.email/>
>>>>
>>>> There is a technical discussion of flattening in the docs at 
>>>> https://github.com/git/git/blob/master/Documentation/howto/keep-canonical-history-correct.txt
>>>>  
>>>>
>>>> Do note the original email title  "Pull is mostly evil" ;-) (whole 
>>>> thread at https://lore.kernel.org/git/5363bb9...@xiplink.com/ 
>>>> <https://lore.kernel.org/git/5363bb9f.40...@xiplink.com/>)
>>>>
>>>> Clarifying the " excluding merge commit changes" (or misunderstandings 
>>>> if you've there were some..) would be really useful. The existing devs do 
>>>> have the 'curse of knowledge' so often can't see the problems.
>>>> On Tuesday, February 21, 2023 at 5:29:36 PM UTC Konstantin Khomoutov 
>>>> wrote:
>>>>
>>>>> On Mon, Feb 20, 2023 at 09:27:20PM -0800, 'Samuel Stern' via Git for 
>>>>> human beings wrote: 
>>>>>
>>>>> > This is an *extremely* specific question which I've been trying to 
>>>>> get an 
>>>>> > answer to for quite a while now, so hopefully someone here knows the 
>>>>> answer. 
>>>>> > 
>>>>> > Let's say I am starting from nothing, an empty directory on a 
>>>>> server. I 
>>>>> > have: 
>>>>> > 
>>>>> > - The URL for a public git repository 
>>>>> > - Two endpoint SHAs (commits on the same branch) 
>>>>> > 
>>>>> > I want to get the complete diff between those commits *excluding* 
>>>>> merge 
>>>>> > commit changes, and I want to do this as fast as possible (so much 
>>>>> faster 
>>>>> > than cloning everything and diffing). 
>>>>> > 
>>>>> > I am able to get almost there with the following sequence: 
>>>>> > 
>>>>> > # Fast clone 
>>>>> > git clone --verbose --no-checkout --filter=blob:limit=250k 
>>>>> --single-branch 
>>>>> > --branch=${branch} --depth=${depth} $REPO_URL 
>>>>> > 
>>>>> > # Get a series of patches 
>>>>> > git log --no-merges --first-parent --patch ${base.sha}..${head.sha} 
>>>>> > 
>>>>> > However I need to get a *single* patch that represents all the 
>>>>> changes 
>>>>> > combined, not a series of patches from the log. 
>>>>>
>>>>> Isn't mere 
>>>>>
>>>>> git diff ${head.sha} ${base.sha} 
>>>>>
>>>>> is what you're looking for? 
>>>>>
>>>>> Otherwise, I'm with Philipp in that your statements (rephrased) 
>>>>>
>>>>> - I want to get a single combined change ("patch") describing the 
>>>>> literal 
>>>>> set of changes between such and such commits. 
>>>>>
>>>>> - I want changes brought in by merge commits excluded. 
>>>>>
>>>>> Contradict each other: I could in principle envision some algorithm 
>>>>> which 
>>>>> would try to incrementally produce a diff as in walks a chain of 
>>>>> commits and 
>>>>> tries to ignore the changes introduced by merge commits located in 
>>>>> that chain, 
>>>>> but leaving aside the fact such an algotithm would be very brittle for 
>>>>> any 
>>>>> real-world cases, I simply see no use for it - even a theoretical one. 
>>>>>
>>>>>
>>>>> You might got trapped by the fact you have found `git log` first in 
>>>>> your 
>>>>> search, and this command traverses all individual commits in the 
>>>>> subgraph it's 
>>>>> told to traverse - including "sidelines" brought in by merge commits. 
>>>>> Instead, plain old `git diff` does not traverse anything: it takes two 
>>>>> states 
>>>>> of the project and compares them. 
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/git-users/aecceb23-fd08-452a-b7c3-b9e16398ea7fn%40googlegroups.com.

Reply via email to