On 05/17/2017 02:55 PM, Jeff King wrote:
> On Wed, May 17, 2017 at 02:05:27PM +0200, Michael Haggerty wrote:
>
>> diff --git a/refs/iterator.c b/refs/iterator.c
>> index bce1f192f7..f33d1b3a39 100644
>> --- a/refs/iterator.c
>> +++ b/refs/iterator.c
>> @@ -292,7 +292,19 @@ static int prefix_ref_iterator_advance(struct
>> ref_iterator *ref_iterator)
>> if (!starts_with(iter->iter0->refname, iter->prefix))
>> continue;
>>
>> - iter->base.refname = iter->iter0->refname + iter->trim;
>> + if (iter->trim) {
>> + /*
>> + * If there wouldn't be at least one character
>> + * left in the refname after trimming, skip
>> + * over it:
>> + */
>> + if (memchr(iter->iter0->refname, '\0', iter->trim + 1))
>> + continue;
>
> It took me a minute to figure out the logic here. You're looking for the
> end-of-string within the trim boundary, which would be an indication
> that the string itself is smaller than the boundary.
>
> But what if it returns true, and the string really is shorter than the
> trim size? That would mean we pass a size to memchr that is longer than
> the buffer we pass. Is that legal?
>
> I suspect it's undefined behavior according to the standard, though I'd
> guess in practice it would be fine. But if I'm understanding it
> correctly, this is the same check as:
>
> if (strlen(iter->iter0->refname) <= iter->trim)
>
> which seems a lot more obvious to me and doesn't fall afoul of weird
> standard issues. The only downside I see is that it would read to the
> end of string when yours could stop at iter->trim bytes. I have no idea
> if that would be measurable (it might even be faster because strlen()
> only has one condition to loop on).
You are correct that I chose `memchr()` over `strlen()` to avoid
scanning a POTENTIALLY EXTREMELY LARGE NUMBER OF CHARACTERS past the
trim length, but of course in real life refnames aren't that long and
`strlen()` might actually be faster.
I *think* `memchr()` is technically OK:
> Implementations shall behave as if they read the memory byte by byte
from the beginning of the bytes pointed to by s and stop at the first
occurrence of c (if it is found in the initial n bytes).
But I agree that the `strlen()` version is also easier to read (I
actually had that version first). So I'll change it as you have suggested.
Thanks,
Michael