Hello, Updated patch attached.
Improvements from last time ( http://lists.gnu.org/archive/html/coreutils/2016-09/msg00011.html ): 1. 'multibyte' and 'mbbuffer' are in gl/ , behave more like gnulib modules. Tests cover all items mentioned in Markus Kuhn's UTF-8 decoder page (https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt). 2. cygwin/UTF-16 surrogates are handled transparently in 'mbbuffer'. Applications under cygwin see 'ucs4_t' and don't need to worry about surrogates (but, wcwidth() will present some problem). Tests ensure parsing under cygwin behaves like other systems. 3. 'cut' supports multibyte '-c' and '-n -b' (but not multibyte '-d' yet). Some tests included. Comments welcomed, - assaf
multibyte-2016-09-19.patch.xz
Description: Binary data
