I have mentioned this in the git mailing list as well because `git diff`/`git apply` are similarly affected (but not `git rebase` which is way more robust), here: https://lore.kernel.org/git/cafjau5savanhz0ampxjcbsvsnaijo+3x5otg_mntkx2gbik...@mail.gmail.com/
But I'll duplicate things here and have them modified to work for diff/patch, for example, to reproduce the issue: 1. download this file: $ wget https://raw.githubusercontent.com/rust-lang/cargo/0.76.0/src/cargo/core/workspace.rs 2. make a copy of file: src/cargo/core/workspace.rs $ cp workspace.rs mod.rs 4. manually edit the file ./mod.rs at line 1118 or manually go to function: pub fn emit_warnings(&self) -> CargoResult<()> { right before that function's end which looks like: Ok(()) } so there at line 1118, insert above that Ok(()) something, doesn't matter what, doesn't have to make sense, like: if seen_any_warnings { //comment bail!("reasons"); } save the file 5. try to generate a diff from it: $ diff -up workspace.rs mod.rs > /tmp/foo.patch you get this: ```diff --- workspace.rs 2024-07-04 09:53:54.713149533 +0200 +++ mod.rs 2024-07-04 09:54:25.356481728 +0200 @@ -1115,6 +1115,10 @@ impl<'cfg> Workspace<'cfg> { } } } + if seen_any_warnings { + //comment + bail!("reasons"); + } Ok(()) } ``` Now this is an ambiguous patch/hunk because if you try to apply this patch on the same original file, cumulatively, it applies successfully in 3 different places, but we won't do that now. 6. now let's pretend that something changed in the original code: $ wget -O workspace.rs https://raw.githubusercontent.com/rust-lang/cargo/0.80.0/src/cargo/core/workspace.rs that overwrote our original 7. make a copy: $ cp workspace.rs orig2.rs 8. reapply that patch on the new changes: $ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs /tmp/foo.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |--- workspace.rs 2024-07-04 09:53:54.713149533 +0200 |+++ mod.rs 2024-07-04 09:54:25.356481728 +0200 -------------------------- patching file workspace.rs Using Plan A... Hunk #1 succeeded at 1099 (offset -16 lines). done 9. look at the applied diff, for no good reason, just to see that it's the same(kind of): $ diff -up orig2.rs workspace.rs ```diff --- orig2.rs 2024-07-04 10:02:05.659797977 +0200 +++ workspace.rs 2024-07-04 10:02:09.023131185 +0200 @@ -1099,6 +1099,10 @@ impl<'gctx> Workspace<'gctx> { } } } + if seen_any_warnings { + //comment + bail!("reasons"); + } Ok(()) } ``` Looks the same, content-wise But looking at the file itself with an editor: $ vim workspace.rs +1040 we're at function: fn validate_manifest(&mut self) -> CargoResult<()> { and at its end, line 1102, is where our patch got applied. So it got applied in the wrong spot, wrong function, because the context looked right. Rust files are more prone to have this happen to them due to `rustfmt` or `cargo fmt` which ensures a certain kind of formatting whereby braces are on single lines like that, thus easily creating similar contexts, and if the changes are "right" then a wrong context might be in the right spot for the hunk to be applied in the wrong spot, just like in this real life case that happened to me and luckily got caught at compile time rather than at runtime(by observing odd behavior, presumably). 10. can try re-applying it, to see that it works(2 more times), until runs out of similar contexts: $ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs /tmp/foo.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |--- workspace.rs 2024-07-04 09:53:54.713149533 +0200 |+++ mod.rs 2024-07-04 09:54:25.356481728 +0200 -------------------------- patching file workspace.rs Using Plan A... Hunk #1 succeeded at 1180 (offset 65 lines). done $ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs /tmp/foo.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |--- workspace.rs 2024-07-04 09:53:54.713149533 +0200 |+++ mod.rs 2024-07-04 09:54:25.356481728 +0200 -------------------------- patching file workspace.rs Using Plan A... Hunk #1 succeeded at 976 (offset -139 lines). done $ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs /tmp/foo.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |--- workspace.rs 2024-07-04 09:53:54.713149533 +0200 |+++ mod.rs 2024-07-04 09:54:25.356481728 +0200 -------------------------- patching file workspace.rs Using Plan A... Reversed (or previously applied) patch detected! Skipping patch. Hunk #1 ignored at 1115. 1 out of 1 hunk ignored -- saving rejects to file workspace.rs.rej done Now our file looks like this: $ diff -up orig2.rs workspace.rs ```diff --- orig2.rs 2024-07-04 10:02:05.659797977 +0200 +++ workspace.rs 2024-07-04 10:05:41.939789948 +0200 @@ -976,6 +976,10 @@ impl<'gctx> Workspace<'gctx> { } } } + if seen_any_warnings { + //comment + bail!("reasons"); + } Ok(()) } @@ -1099,6 +1103,10 @@ impl<'gctx> Workspace<'gctx> { } } } + if seen_any_warnings { + //comment + bail!("reasons"); + } Ok(()) } @@ -1176,6 +1184,10 @@ impl<'gctx> Workspace<'gctx> { } } } + if seen_any_warnings { + //comment + bail!("reasons"); + } Ok(()) } ``` A fix, if desired, might be, to increase the context length at hunk generation, if that hunk is detected to can be applied(via `patch`, so to speak) in more than 1 place in the original file, until either some preset max context length is reached (in which case fail to generate diff), or the generated hunk is no longer ambiguous with respect to its application on the original file. However, this doesn't mean the hunk is now unambiguous, because ambiguity depends on the target file to be patched, therefore `patch` would make sure that each hunk of a patch, if applied independently couldn't be applied in more than 1 spot, if it can patching outta fail; furthermore, I think, if during the patch application the current hunk can be applied more than once (ie. in more than 1 spot) then patching outta fail as well (this means, the previously applied hunks of the patch must've created new spots for this hunk to can apply to, even though the hunk itself when applied independently found only 1 spot); I'm not sure if a third case should be tried as well, that is, after the patch was applied try to apply it again and if ANY hunks succeed, then it's also to be considered ambiguous and fail to patch it - unclear how necessary this would be. I've a proof of concept patch fix(as mentioned above) for `diffy` which is a rust crate/library which can do simple but compatible enough diff generation and patch application, which attempts to show that this context increase would work, and can generate an unambiguous patch/hunk (but it's unambiguous with respect to the original file, not with respect to any future files to be applied to, which is why the act of patching has to be modified as well to ensure the hunks are truly unambiguous), which is here: https://github.com/bmwill/diffy/issues/31 But of course, I did the `diffy` p.o.c. patch because it was easier for me to do in rust, than in C, and it's just to show a way that this could be done. Ideally, each ambiguous hunk would have only its own context length increased, instead of the whole diff/patch itself have it increased, but this is more difficult due to hunks merging into one as context length is higher, so keeping track seems more complex. Now I ask, if someone's so inclined, please consider making some kind of fix for this issue for `diff` and for `patch`, to ensure ambiguous hunks generation and application isn't possible. Maybe there are better ways to do this than what I mentioned, aside from increasing context length for only the ambiguous hunks(which I wish I could've done, but seems too complex), maybe a completely new way entirely. Thank you for your time and consideration. Have a great day everyone!
