On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:

There are lots of places where invalid Unicode is either commonplace or legal, e.g. Linux file names, and therefore auto decoding cannot be used. It turns out in the wild that pure Unicode is not universal - there's lots of dirty Unicode that should remain unmolested because it's user data, and auto decoding does not play well with that mentality.

As a slightly tangential aside.....


There exists a proposal for a linux kernel module to render the creation of such names impossible.....

I for one will install it on all my systems as soon as I can.

However, until then, my day job requires me to find, scan and analyze and work with whatever crud, the herd of cats I work with, throws into the repo.

And no, sadly I can't just rewrite everything because they (or some tool they use) doesn't understand UTF8.

Reply via email to