Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

Eric Sunshine Thu, 15 Mar 2018 16:55:07 -0700

On Thu, Mar 15, 2018 at 7:35 PM, Lars Schneider
<larsxschnei...@gmail.com> wrote:
>> On 16 Mar 2018, at 00:25, Eric Sunshine <sunsh...@sunshineco.com> wrote:
>>>        if (is_encoding_utf8(src) && is_encoding_utf8(dst))
>>>                return 1;
>>> +       if (same_utf_encoding(src, dst))
>>> +               return 1;
>>>        return !strcasecmp(src, dst);
>>> }
>>
>> This seems odd. I would have expected the newly-added generalized
>> conditional to replace the original UTF-8-specific conditional, not
>> supplement it. That is, shouldn't the entire function body be:
>>
>>    if (same_utf_encoding(src, dst))
>>        return 1;
>>    return !strcasecmp(src, dst);
>
> No, because is_encoding_utf8() returns "true" (=1) if the encoding
> is NULL. That is not the case for UTF-16 et al. The caller of
> same_encoding() might expect that behavior.
> I could have moved the "UTF-8" == NULL assumption into
> same_utf_encoding() but that did not feel right.
> Does this make sense?


Not particularly.

Looking at 677cfed56a (commit-tree: cope with different ways "utf-8"
can be spelled., 2006-12-30), which introduced is_encoding_utf8()
along with its builtin NULL-check, I can see why that function wants
to treat NULL the same as UTF-8.

However, I'm having a tough time imagining cases in which callers
would want same_encoding() to return true if both arguments are NULL,
but outright crash if only one is NULL (which is the behavior even
before this patch). In other words, same_encoding() takes advantage of
is_encoding_utf8() for its convenience, not for its NULL-handling.
Given that view, the two explicit is_encoding_utf8() calls in
same_encoding() seem redundant once the same_utf_encoding() call is
added.

Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

Reply via email to