On Thu, Apr 14, 2016 at 11:07:09AM -0700, Junio C Hamano wrote:
> Even though a Git commit object is designed to be capable of storing
> any binary data as its payload, in practice people use it to describe
> the changes in textual form, and tools like "git log" are designed to
> treat the payload as text.
>
> Detect and warn when we see any commit object with a NUL byte in
> it.
>
> Note that a NUL byte in the header part is already detected as a
> grave error. This change is purely about the message part.
>
> Signed-off-by: Junio C Hamano <[email protected]>
Thanks, I was just reading over some of the old threads, and wondering
if it was time to resurrect this idea.
> @@ -610,6 +611,7 @@ static int fsck_commit_buffer(struct commit *commit,
> const char *buffer,
> struct commit_graft *graft;
> unsigned parent_count, parent_line_count = 0, author_count;
> int err;
> + const char *buffer_begin = buffer;
>
> if (verify_headers(buffer, size, &commit->object, options))
> return -1;
You need this "buffer_begin" because we move the "buffer" pointer
forward as we parse. But perhaps whole-buffer checks should simply go at
the top (next to verify_headers) before we start advancing the pointer.
To me, that makes the function's flow more natural.
But alternatively...
> @@ -671,6 +673,12 @@ static int fsck_commit_buffer(struct commit *commit,
> const char *buffer,
> if (err)
> return err;
> }
> + if (memchr(buffer_begin, '\0', size)) {
> + err = report(options, &commit->object, FSCK_MSG_NUL_IN_COMMIT,
> + "NUL byte in the commit object body");
> + if (err)
> + return err;
> + }
Here we've parsed to the end of the headers we know about. We know
there's no NUL there, because verify_headers() would have complained.
And because the individual header parsers would have complained. So I
actually think we could check from "buffer" (of course we do still need
to record the beginning of the buffer to adjust "size" appropriately).
It's a little more efficient (we don't have to memchr over the same
bytes again). But I'd worry a little that doing it that way would
introduce coupling between this check and verify_headers(), though (so
that if the latter ever changes, our check may start missing cases).
So yet another alternative would be to include this check in
verify_headers(). It would parse to the end of the headers as now, and
then from there additionally look for a NUL in the body.
Of the three approaches, I think I like that third one. It's the most
efficient, and I think the flow is pretty clear. We'd probably want to
rename verify_headers(), though. :)
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html