On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
There are lots of places where invalid Unicode is either
commonplace or legal, e.g. Linux file names, and therefore
auto decoding cannot be used. It turns out in the wild that
pure Unicode is not universal - there's lots of dirty Unicode
that should remain unmolested because it's user data, and auto
decoding does not play well with that mentality.
As a slightly tangential aside.....
There exists a proposal for a linux kernel module to render the
creation of such names impossible.....
I for one will install it on all my systems as soon as I can.
However, until then, my day job requires me to find, scan and
analyze and work with whatever crud, the herd of cats I work
with, throws into the repo.
And no, sadly I can't just rewrite everything because they (or
some tool they use) doesn't understand UTF8.