Well, hard to grep for '3' in a big codebase. They should have used UTFMAX in
the first place.
I see though that currently in libc.h I have
enum
{
UTFmax = 4, /* maximum bytes per rune */
Runesync = 0x80, /* cannot represent part of a UTF
sequence (<) */
Runeself = 0x80, /* rune and UTF sequences are the same
(<) */
Runeerror = 0xFFFD, /* decoding error in UTF */
Runemax = 0x10FFFF, /* 21-bit rune */
Runemask = 0x1FFFFF, /* bits used by runes (see grep) */
};
so Runemax seems to indicate we never produce rune using more than 3 bytes no?
So maybe buf[3] is safe?
On Jun 18, 2014, at 10:36 AM, erik quanstrom <[email protected]> wrote:
> On Wed Jun 18 13:36:09 EDT 2014, [email protected] wrote:
>> used to be 3 :)
>>
>> "UTFmax, defined as 3 in <libc.h>, is the maximum number of bytes
>> required to represent a rune."
>
> which is exactly why this should have been caught.
> this one's my fault.
>
> - erik
>