On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e...@80x24.org> wrote:
> Eric Sunshine <sunsh...@sunshineco.com> wrote:
>> On Mon, May 30, 2016 at 7:21 PM, Eric Wong <e...@80x24.org> wrote:
>> > +                       if (pp->fmt == CMIT_FMT_MBOXRD &&
>> > +                                       !regexec(mboxrd_from, line, 0, 0, 
>> > 0))
>> > +                               strbuf_addch(sb, '>');
>>
>> At first glance, this seems dangerous since it's handing 'line' to
>> regexec() without paying attention to 'linelen'. For an arbitrary
>> regex, this could result in erroneous matches on subsequent "lines",
>> however, since this expression is anchored with '^', it's not a
>> problem. But, it is a bit subtle.
>
> Maybe having more context of the pp_remainder function and
> seeing the get_one_line call would've helped in the diff;
> I didn't think of this issue once I figured out where to
> place the change.

No, extra context wouldn't have helped. The problem is that
get_one_line() merely returns the length of the line, which might be
where the NUL-terminator is or it might be where the next newline is.
Therefore, you can't rely upon the "current line" being
NUL-terminated. So, in general, it's not "safe" to pass it to a
function expecting the "line" to end at a NUL-terminator.

> On the other hand, not being too familiar with git C APIs, I was
> more worried strbuf was not NUL-terminated for regexec, but it
> seems to be.

Yes, that's a guarantee, but it doesn't help in this case. Given
line="foo\nbar", get_one_line(line) will return 4, the length of
"foo\n", but regexec() won't know to stop looking until it hits the
NUL after the 'r'. An arbitrary regex, such as /bar/ will match beyond
what get_one_line() considers the end-of-line, which is why this code
looks scary (wrong) at first glance.

>> I wonder if hand-coding, rather than using a regex, could be an improvement:
>>
>>     static int is_mboxrd_from(const char *s, size_t n)
>>     {
>>         size_t f = strlen("From ");
>>         const char *t = s + n;
>>
>>         while (s < t && *s == '>')
>>             s++;
>>         return t - s >= f && !memcmp(s, "From ", f);
>>     }
>>
>> or something.
>
> Yikes.  I mostly work in high-level languages and do my best to
> avoid string parsing in C; so that scares me.  A lot.

The hand-coded is_mboxrd_from() above is pretty much idiomatic C and
(I think) typical of how such a function would be coded in Git itself,
so it looks normal and easy to grok to me (but, of course, I'm
probably biased since I wrote it).

> I admit a regex isn't necessary, but I'm wondering if the above
> could be made less frightening to someone like me.

Perhaps, but it's difficult to say without knowing how it frightens you.

> Maybe extra test cases + valgrind can quell my fears :)

I can envision tests such as:

    ""
    "F"
    "From"
    "From "
    "From     "
    "From foobar"

and so on, if that's what you mean.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to