On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:

> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
> > The patch below fixes it, but it's terribly inefficient (it just detects
> > the situation and reallocates). It would be much better to disable the
> > reuse_worktree_file mmap when we populate the filespec, but it is too
> > late to pass an option; we may have already populated from an earlier
> > diffcore stage.
> > 
> > I guess if we teach the whole diff code that "-G" (and --pickaxe-regex)
> > is brittle, we can disable the optimization from the beginning based on
> > the diff options. I'll take a look.
> Hmm. That is problematic for two reasons.
>   1. The whole diff call chain will have to be modified to pass the
>      options around, so they can make it down to the
>      diff_populate_filespec level. Alternatively, we could do some kind
>      of global hack, which is ugly but would work OK in practice.
>   2. Reusing a working tree file is only half of the reason a filespec
>      might be mmap'd. It might also be because we are literally diffing
>      the working tree. "-G" was meant to be used to limit log traversal,
>      but it also works to reduce the diff output for something like "git
>      diff HEAD^".
> I really wish there were an alternate regexec interface we could use
> that took a pointer/size pair. Bleh.

Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:

  1. Use a regex library that does not require NUL termination. If we
     are bound by the regular regexec interface, this is not feasible.
     But the GNU implementation works on arbitrary-length buffers (you
     just have to use a slightly different interface), and we already
     carry it in compat. It would mean platforms which provide a working
     but non-GNU regexec would have to start defining NO_REGEX.

  2. Figure out a way to get one extra zero byte via mmap. If the
     requested size does not fall on a page boundary, you get extra
     zero-ed bytes. Unfortunately, requesting an extra byte does not
     do what we want; you get SIGBUS accessing it.

  3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
     way we only incur the cost when we need it.

  4. Avoid mmap-ing in the first place when we are using -G or
     --pickaxe-regex (e.g., by doing a big read()). At first glance,
     this sounds more efficient than loading the data one way and then
     making another copy. But mmap+memcpy, aside from the momentary
     doubled memory requirement, is probably just as fast or faster than
     calling read() repeatedly.

I am really tempted by (1).

Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to