Re: crash on git diff-tree -Ganything for new files with textconv filter

Junio C Hamano Tue, 30 Oct 2012 05:46:23 -0700

(1) sounds attractive for more than one reason. In addition to avoidance of 
this issue, it would bring bug-to-bug compatibility across platforms.


(4), if we can run grep on streaming data (tweak interface we have for checking 
out a large blob to the working tree), would let us work on dataset larger than 
fit in core. Even though it would be much more work, it might turn out to be a 
better option in the longer run.

Jeff King <[email protected]> wrote:

>On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:
>
>> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
>> 
>> > The patch below fixes it, but it's terribly inefficient (it just
>detects
>> > the situation and reallocates). It would be much better to disable
>the
>> > reuse_worktree_file mmap when we populate the filespec, but it is
>too
>> > late to pass an option; we may have already populated from an
>earlier
>> > diffcore stage.
>> > 
>> > I guess if we teach the whole diff code that "-G" (and
>--pickaxe-regex)
>> > is brittle, we can disable the optimization from the beginning
>based on
>> > the diff options. I'll take a look.
>> 
>> Hmm. That is problematic for two reasons.
>> 
>>   1. The whole diff call chain will have to be modified to pass the
>>      options around, so they can make it down to the
>>      diff_populate_filespec level. Alternatively, we could do some
>kind
>>      of global hack, which is ugly but would work OK in practice.
>> 
>>   2. Reusing a working tree file is only half of the reason a
>filespec
>>      might be mmap'd. It might also be because we are literally
>diffing
>>      the working tree. "-G" was meant to be used to limit log
>traversal,
>>      but it also works to reduce the diff output for something like
>"git
>>      diff HEAD^".
>> 
>> I really wish there were an alternate regexec interface we could use
>> that took a pointer/size pair. Bleh.
>
>Thinking on it more, my patch, hacky thought it seems, may not be the
>worst solution. Here are the options that I see:
>
>  1. Use a regex library that does not require NUL termination. If we
>     are bound by the regular regexec interface, this is not feasible.
>     But the GNU implementation works on arbitrary-length buffers (you
>     just have to use a slightly different interface), and we already
>    carry it in compat. It would mean platforms which provide a working
>     but non-GNU regexec would have to start defining NO_REGEX.
>
>  2. Figure out a way to get one extra zero byte via mmap. If the
>     requested size does not fall on a page boundary, you get extra
>     zero-ed bytes. Unfortunately, requesting an extra byte does not
>     do what we want; you get SIGBUS accessing it.
>
> 3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
>     way we only incur the cost when we need it.
>
>  4. Avoid mmap-ing in the first place when we are using -G or
>     --pickaxe-regex (e.g., by doing a big read()). At first glance,
>     this sounds more efficient than loading the data one way and then
>     making another copy. But mmap+memcpy, aside from the momentary
>    doubled memory requirement, is probably just as fast or faster than
>     calling read() repeatedly.
>
>I am really tempted by (1).
>
>Given that (2) does not work, unless somebody comes up with something
>clever there, that would make (3) the next best choice.
>
>-Peff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: crash on git diff-tree -Ganything for new files with textconv filter

Reply via email to