--On Sunday, October 23, 2011 07:11 +0100 Dave CROCKER
<[email protected]> wrote:
>
>> Remember, in UTF-8, characters can be multiple octets. So 998
>> UTF-8 encoded *characters* are likely to be many more than
>> 998 octets long. So the change is to say that the limit is in
>> octets, not in characters.
>
>
> The switch in vocabulary is clearly subtle for readers. (I
> missed it too.)
>
> I suggest adding some language that highlights the point,
> possibly the same language as you just used to explain it.
In addition to what might be useful/ necessary for readers of
5335bis, in retrospect, we ought to have a prominent comment in
one of the more generic i18n documents that highlights the fact
that the, once one moves beyond ASCII, length-in-characters and
length-in-octets, can no longer be assumed to be the same. When
one is actually talking about storage length,
length-in-characters should be prohibited from our vocabulary
going forward. That would actually make an interesting
extension to a nits-checker if someone could figure out how to
do it or, at least, a flag to the RFC Editor about something
they should watch out for.
john
_______________________________________________
Ietf mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ietf