On Fri, Dec 07 2018, Jonathan Nieder wrote:

> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting.  As Ævar mentioned, the relevant code is
>
>               /* Ignore commit comments */
>               if (!patchlen && !starts_with(line, "diff "))
>                       continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this.  So if we have a better heuristic for where the
> diff starts, it would be good to use it.

No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.

See this patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    index 9183875dad..dd8abe9185 100644
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store 
*refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Observe that the diff --git line matters, we hash it:

    $ git diff-tree -p HEAD~.. | git patch-id
    5870d115b7e2a9a936ab8fdc254932234413c710 
0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id 
--stable
    5870d115b7e2a9a936ab8fdc254932234413c710 
0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id 
--stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 
0000000000000000000000000000000000000000

The thing it doesn't care about is the "index" between the "diff" and
patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| git patch-id --stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 
0000000000000000000000000000000000000000

We also care about the +++ and --- lines:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
    56985c2c38cce6079de2690082e1770a8e81214c 
0000000000000000000000000000000000000000

Then we normalize the @@ line, e.g.:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 
0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| perl -pe 's/\d+/123/g' | git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 
0000000000000000000000000000000000000000


There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:
    
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -123,123 +123,123 @@ static void files_reflog_path(struct 
files_ref_store *refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.

So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:

    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

To be:

    diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

It'll make the state machine a bit more complex, but IMO it would suck
more if we generate a different hash depending on the tool generating
the diff. OTOH the "diff --git" line was never there, and it *does*
matter, so should we be faking it up? Maybe not, bah!

> "git apply" uses apply.c::find_header, which is more permissive.
> Maybe it would be possible to unify these somehow.  (I haven't looked
> closely enough to tell how painful that would be.)
>
> Thanks and hope that helps,
> Jonathan

Reply via email to