Re: [PATCH] t4062: stop using repetition in regex

René Scharfe Tue, 08 Aug 2017 23:17:38 -0700

Am 09.08.2017 um 07:29 schrieb Junio C Hamano:
> René Scharfe <l....@web.de> writes:
> 
>> Am 09.08.2017 um 00:26 schrieb Junio C Hamano:
>>> ... but in the meantime, I think replacing the test with "0$" to
>>> force the scanner to find either the end of line or the end of the
>>> buffer may be a good workaround.  We do not have to care how many of
>>> random bytes are in front of the last "0" in order to ensure that
>>> the regexec_buf() does not overstep to 4097th byte, while seeing
>>> that regexec() that does not know how long the haystack is has to do
>>> so, no?
>>
>> Our regexec() calls strlen() (see my other reply).
>>
>> Using "0$" looks like the best option to me.
> 
> Yeah, it seems that way.  If we want to be close/faithful to the
> original, we could do "^0*$", but the part that is essential to
> trigger the old bug is not the "we have many zeroes" (or "we have
> 4096 zeroes") part, but "zero is at the end of the string" part, so
> "0$" would be the minimal pattern that also would work for OBSD.


Thought about it a bit more.

"^0{4096}$" checks if the byte after the buffer is \n or \0 in the
hope of triggering a segfault.  On Linux I can access that byte just
fine; perhaps there is no guard page.  Also there is a 2 in 256
chance of the byte being \n or \0 (provided its value is random),
which would cause the test to falsely report success.

"0$" effectively looks for "0\n" or "0\0", which can only occur
after the buffer.  If that string is found close enough then we
may not trigger a segfault and report a false positive.

In the face of unreliable segfaults we need to reverse our strategy,
I think.  Searching for something not in the buffer (e.g. "1") and
considering matches and segfaults as confirmation that the bug is
still present should avoid any false positives.  Right?

Thanks,
René

Re: [PATCH] t4062: stop using repetition in regex

Reply via email to