Peter Krefting <> writes:

> brian m. carlson:
>> +            /* U+FFFE and U+FFFF are guaranteed non-characters. */
>> +            if ((codepoint & 0x1ffffe) == 0xfffe)
>> +                    return bad_offset;
> I missed this the first time around: All Unicode characters whose
> lower 16-bits are FFFE or FFFF are non-characters, so you can re-write
> that to:
>   /* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
>   if ((codepoint & 0xfffe) == 0xfffe)
>    return bad_offset;
> Also, the range U+FDD0--U+FDEF are also non-characters, if you wish to
> be really pedantic.

Yeah, while we are at it, doing this may not hurt.  I think Brian's
two patches are in fairly good shape otherwise, so perhaps you can
do this as a follow-up patch on top of the tip of the topic,
e82bd6cc (commit: reject overlong UTF-8 sequences, 2013-07-04)?

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
More majordomo info at

Reply via email to