On 2012-10-28 13:01, Jeff King wrote:
> On Sat, Oct 27, 2012 at 08:37:24PM +0200, Peter Oberndorfer wrote:
>> It seems "git diff-tree -Ganything <tree>" crashes with a null
>> pointer dereference
>> when run on a commit that adds a file (pdf) with a textconv filter.
>> It can be reproduced with vanilla git by having a commit on top that
>> adds a file with a textconv filter and executing git diff-tree
>> -Ganything HEAD
>> But running git log -Ganything still works without a crash.
>> This problem seems to exist since the feature was first added in f506b8e8b5.
> Thanks for a thorough bug report. I didn't reproduce the crash, but I
> can see how it happens (it happens with diff-tree because we will reuse
> the working tree file in that instance; it could also happen if you
> turned on textconv caching).
>> While testing I also noticed the -S and -G act on the original file
>> instead of the textconv munged data.
>> Is this intentional or caused by accessing the wrong data?
> Both, perhaps. :)
thanks for your patch for this!
> -G operates on the munged data; you can see it feed the munged data to
> xdiff in diff_grep. But the optimization for handling added and removed
> files accidentally fed the wrong pointer. Fixing that is a no-brainer,
> since the optimization is inconsistent with the regular code path.
> -S, however, predates the invention of textconv and has never used it.
> It is a little less clear that textconv is the right thing here, because
> it is not about grepping the diff, but about counting occurrences of the
> string inside the file content. I tend to think that doing so on the
> textconv'd data would be what people generally want, but it is a
> behavior change.
>> Wild guess: should we really access p->one->data and not mf1.ptr?
> Precisely. Thanks for your wild guess; it made finding the bug very
> easy. :)
>> Is there some more information i should provide?
> The patch below should fix it. I added tests, but please try your
> real-world test case on it to double-check.
I tested your patch, but now it crashes for another reason :-)
i have a file with exactly 12288(0x3000) bytes in the repository.
When the file is loaded, the data is placed luckily so the data end
falls at a page boundary.
Later diff_grep() calls regexec() which calls strlen() on the loaded buffer
and ends up reading beyond the actual data into the next page
which is not allocated and causes a pagefault.
Or it could possibly (randomly) match the regex on data that is not
actually part of a file...
Different memory allocation rules on Windows probably also have some
My guess is that diff_filespec->data is not supposed to be zero terminated
and we should not invoke strlen() on a not zero terminated data.
But this should be decided by somebody who knows the rules.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html