On Sun, Sep 25, 2016 at 05:10:31PM -0700, Junio C Hamano wrote:
> Gustavo Grieco <[email protected]> writes:
>
> > We found a stack read out-of-bounds parsing object files using git 2.10.0.
> > It was tested on ArchLinux x86_64. To reproduce, first recompile git with
> > ASAN support and then execute:
> >
> > $ git init ; mkdir -p .git/objects/b2 ; printf 'x' >
> > .git/objects/b2/93584ddd61af21260be75ee9f73e9d53f08cd0
>
> Interesting. If you prepare such a broken loose object file in your
> local repository, I would expect that either unpack_sha1_header() or
> unpack_sha1_header_to_strbuf() that sha1_loose_object_info() calls
> would detect and barf by noticing that an error came from libz while
> it attempts to inflate and would not even call parse_sha1_header.
>
> But it is nevertheless bad to assume that whatever happens to
> inflate without an error must be formatted correctly to allow
> parsing (i.e. has ' ' and then NUL termination within the first 32
> bytes after inflation), which is exactly what the hdr[32] is saying.
Yeah. I also was surprised that we didn't barf on a zlib failure. But
based on previous debugging of corrupted zlib data, my recollection
is that there are a large number of weird corruptions that zlib will
happily pass back and only later complain about a checksum error. So
presumably "x" is one of those, and it might not hold for other
corruptions (but I didn't try).
> Note that this is totally unteseted and not thought through; I
> briefly thought about what unpack_sha1_header_to_strbuf() does with
> this change (it first lets unpack_sha1_header() to attempt with a
> small buffer but it seems to discard the error code from it before
> seeing if the returned buffer has NUL in it); there may be bad
> interactions with it.
Yeah, that seems wrong. I don't think it would involve an out of bounds
read, but we probably could fail to correctly report zlib corruption.
> diff --git a/sha1_file.c b/sha1_file.c
> index 60ff21f..dfcbd76 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1648,6 +1648,8 @@ unsigned long unpack_object_header_buffer(const
> unsigned char *buf,
>
> int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned
> long mapsize, void *buffer, unsigned long bufsiz)
> {
> + int status;
> +
> /* Get the data stream */
> memset(stream, 0, sizeof(*stream));
> stream->next_in = map;
> @@ -1656,7 +1658,15 @@ int unpack_sha1_header(git_zstream *stream, unsigned
> char *map, unsigned long ma
> stream->avail_out = bufsiz;
>
> git_inflate_init(stream);
> - return git_inflate(stream, 0);
> + status = git_inflate(stream, 0);
> + if (status)
> + return status;
> +
> + /* Make sure we got the terminating NUL for the object header */
> + if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
> + return -1;
> +
> + return 0;
This doesn't look too invasive as an approach, though I would have done
it differently. We're making the assumption that once there is a NUL,
the header-parser won't do anything stupid, which creates a coupling
between those two bits of code. My inclination would have been to just
treat the header as a ptr/len pair, and make sure the parser never reads
past the end.
But I implemented that, and it _is_ rather invasive. And it's not like
coupling unpack_sha1_header() and parse_sha1_header() is all that
terrible; they are meant to be paired.
I haven't read through your follow-up yet; I'll do that before posting
my version.
> static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char
> *map,
> @@ -1758,6 +1768,8 @@ static int parse_sha1_header_extended(const char *hdr,
> struct object_info *oi,
> char c = *hdr++;
> if (c == ' ')
> break;
> + if (!c)
> + die("invalid object header");
> type_len++;
> }
We keep reading from hdr after this, though I think those bits would all
bail correctly on seeing NUL.
-Peff