32 BOM

Junio C Hamano Tue, 06 Mar 2018 12:50:43 -0800

[email protected] writes:

> +int is_missing_required_utf_bom(const char *enc, const char *data, size_t 
> len)
> +{
> +     return (
> +        !strcmp(enc, "UTF-16") &&
> +        !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
> +          has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
> +     ) || (
> +        !strcmp(enc, "UTF-32") &&
> +        !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
> +          has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
> +     );
> +}


These strcmp() calls seem inconsistent with the principle embodied
by utf8.c::fallback_encoding(), i.e. "be lenient to what we accept",
and make the interface uneven.  I am wondering if we also want to
complain when the user gave us "utf16" and there is no byte order
mark in the contents, for example?  Also "UTF16" or other spelling
the platform may support but this code fails to recognise will go
unchecked.

Which actually may be a feature, not a bug, to be able to bypass
this check---I dunno.

The same comment applies to the previous step.

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

Reply via email to