On 22/07/16 04:23, Assaf Gordon wrote: > Hello, > >> On Jul 21, 2016, at 06:08, Pádraig Brady <[email protected]> wrote: >> [...] >> It seems like --normalization={NFKD,NFKD,NFC,NFD} functionality would >> also be quite cohesive in such a util. > > Attached an improved version with unicode normalization support.
Wow, very nice. > Before continuing with other stuff (e.g. more tests, documentation, news, > etc.), > it's worth discussing if this is the path to take (or if we want to add this > to each individual utility). I'm not sure, but it would be nice as I said if we could get away with "replace" mode in other utils. By having a separate util, it follows the idea of validating/transforming input as early as possible so as to simplify the rest of the system. Also it follows the idea that if something can be done separately it should be done so. > Also, do we keep these options or modify them? > e.g. 'uconv' uses different terminology for handling invalid sequences: stop, > skip, substitute, escape (corresponding to abort, discard, replace, recode > below). Doesn't really matter. I find your naming slightly more descriptive. > To keep the implementation simple, unicode normalization requires UTF-8 > locales - is this a valid requirement? Given how prevalent utf8 is I think this is fine. It other tools if there is an option we should also tune for utf-8 input. > And of course, what about the name? I've a slight preference for unorm thanks! Pádraig
