On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:
> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
> > The patch below fixes it, but it's terribly inefficient (it just detects
> > the situation and reallocates). It would be much better to disable the
> > reuse_worktree_file mmap when we populate the filespec, but it is too
> > late to pass an option; we may have already populated from an earlier
> > diffcore stage.
> > I guess if we teach the whole diff code that "-G" (and --pickaxe-regex)
> > is brittle, we can disable the optimization from the beginning based on
> > the diff options. I'll take a look.
> Hmm. That is problematic for two reasons.
> 1. The whole diff call chain will have to be modified to pass the
> options around, so they can make it down to the
> diff_populate_filespec level. Alternatively, we could do some kind
> of global hack, which is ugly but would work OK in practice.
> 2. Reusing a working tree file is only half of the reason a filespec
> might be mmap'd. It might also be because we are literally diffing
> the working tree. "-G" was meant to be used to limit log traversal,
> but it also works to reduce the diff output for something like "git
> diff HEAD^".
> I really wish there were an alternate regexec interface we could use
> that took a pointer/size pair. Bleh.
Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:
1. Use a regex library that does not require NUL termination. If we
are bound by the regular regexec interface, this is not feasible.
But the GNU implementation works on arbitrary-length buffers (you
just have to use a slightly different interface), and we already
carry it in compat. It would mean platforms which provide a working
but non-GNU regexec would have to start defining NO_REGEX.
2. Figure out a way to get one extra zero byte via mmap. If the
requested size does not fall on a page boundary, you get extra
zero-ed bytes. Unfortunately, requesting an extra byte does not
do what we want; you get SIGBUS accessing it.
3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
way we only incur the cost when we need it.
4. Avoid mmap-ing in the first place when we are using -G or
--pickaxe-regex (e.g., by doing a big read()). At first glance,
this sounds more efficient than loading the data one way and then
making another copy. But mmap+memcpy, aside from the momentary
doubled memory requirement, is probably just as fast or faster than
calling read() repeatedly.
I am really tempted by (1).
Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html