On 2024.04.09 09:00, Thomas Schmitt via Libcdio-devel wrote:
Hi,
Pete Batard wrote:
Or maybe there's a mathematical proof that
a UTF-8 glyph byte encoding can never be larger than 1.5 the UTF-16 glyph
byte encoding
I thought to have given one. Let me try again:
https://datatracker.ietf.org/doc/html/rfc3629
"In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16
accessible range) are encoded using sequences of 1 to 4 octets."
My issue is that I see nothing preventing Unicode to add ranges post
U+10FFFF that are going to require more than 1.5 UTF-16. That's what I
tried to allude to in my reply,
In short, especially considering that we have precedent of the Unicode
Consortium expanding on what they had previously established as an upper
range limit, I still see no formal proof that 1.5 UTF-16 will *always*
be enough for the UTF-8 conversion, especially as I fully expect Unicode
to grow past the U+10FFFF boundary.
I try to obey specs and to avoid speculations about what of their
provisions would possibly not happen in practice.
I try to base my implementations on real-life usage, and not obvious
abuse of specs.
To my experience this pays off on the long run.
To my experience, my approach pays off better because the day you need
to alter your current implementation, you are not speculating about how
specs might be (ab)used, but instead do so based on direct empirical data.
Plus, and this is the most important part, placing obvious stopgaps for
scenarios you have yet to see (and expect to rarely ever happen if ever)
is the only way I know to get proper feedback on your code usage, which
does help you design better software in the long run. Otherwise, IMO,
you're just producing academical code, with no clue about how it is
actually being used.
In this instance, if someone does see a truncating warning and gets an
issue as a result, I very much want to get feedback on how they got it,
and what kind of image they are producing. Because it'll give me data
not for just this specific case, but for all the other colloratory
real-life usage, which is paramount to understanding your user base and
in turn design software that actually fills their needs (rather than
what one expects their needs to be).
Regards,
/Pete