On Thu, 8 Sep 2016, Jeff King wrote:
> On Thu, Sep 08, 2016 at 09:29:58AM +0200, Johannes Schindelin wrote:
> > sorry for the late answer, I was really busy trying to come up with a new
> > and improved version of the patch series, and while hunting a bug I
> > introduced got bogged down with other tasks.
> No problem. I am not in a hurry.
I kind of am. The second half of September, I won't be able to do much of
anything Git-related, and this is a major bug that blocks some important
So I kind of have to press on that front.
> I think I'd rather just have:
> #ifndef REG_STARTEND
> #error "Your regex library sucks. Compile with NO_REGEX=NeedsStartEnd"
Done. Although I permitted myself to reword this a little ;-)
> One other question about REG_STARTEND is: what does it do with NULs
> inside the buffer? Certainly glibc (and our compat/regex) treat it as a
> buffer with a particular length and ignore embedded NULs, as we want.
> But the NetBSD documentation says only:
> REG_STARTEND The string is considered to start at string +
> pmatch.rm_so and to have a terminating NUL
> located at string + pmatch.rm_eo (there need not
> actually be a NUL at that location),
> Besides avoiding a segfault, one of the benefits of regcomp_buf() is
> that we will now find pickaxe-regex strings inside mixed binary/text
> files. But it's not clear to me that NetBSD's implementation does this.
> I guess we can assume it is fine (it is certainly no _worse_ than the
> current behavior), and if people's platforms do not handle it, they can
> build with NO_REGEX.
René mentioned in f96e567 (grep: use REG_STARTEND for all matching if
available, 2010-05-22) something along the lines of REG_STARTEND being
able to parse beyond NULs. My interpretation of NetBSD's documentation
agrees with your interpretation, though, that the buffers are still
thought of as being NUL-terminated, even if rm_eo makes the code *not*
look at that particular NUL.
Be that as it may: it is completely outside the purpose of my patch series
to take care of making it possible for Git's regex functions to match
buffers with embedded NULs. The only purpose of my patch series is to fix
the crash that was reported to me due to regexec() reading past a mmap()ed
buffer. I already let myself being talked into fixing more things than
that, and I have to leave it at that.