> On Jun 1, 2016, at 10:43 AM, Kamil Cholewiński <[email protected]> wrote: > >> On Wed, 01 Jun 2016, Ben Woolley <[email protected]> wrote: >> That is the reason why I am erring on the side of 5% this time. > > The 95% use case here is handling UTF8-encoded Unicode text. Secure by > default should be the norm, not a magic flag, not buried in a readme. >
Yes, that is what I am suggesting for libutf. I believe we have the same concern. > If you need to encode an arbitrarily large integer into a stream of > bytes, then use a library specifically designed for encoding arbitrarily > large integers into streams of bytes. > Yes, that is what I am suggesting for "libctf", and that it not be called UTF. Then the encoding expert making the next encoding update will hopefully be the only one messing with it. I could have used a "libctf" before, when updating an app beyond what was available in the libraries I was stuck with. > Yes, we're making up problems. Or are we ultimately agreeing? :) The reason why I am looking at this on the several-year time span is this: how often do people review encoding implementations? Probably once every 5 years. With changes every 7 years to the standard, there is a need for random Joe to be able to glance at a libutf and see the quirks in a wrapper, and not have to touch a slightly convoluted transformation function just to see if a range is handled properly. I am basing these thoughts on things that I have actually done. For example, at one company, I worked on a statistical dictionary compression that worked its symbols in between "CTF" ranges. That was essentially a libctfcomp library that could be consumed by an unaltered libutf. That way, the change can be made in a secure way more easily. I have worked with UTF-8 at this level in 3 different companies already. Maybe there is a real need for a libctf.
