Hello,

Updated patch attached.

Improvements from last time ( 
http://lists.gnu.org/archive/html/coreutils/2016-09/msg00011.html ):

1. 'multibyte' and 'mbbuffer' are in gl/ , behave more like gnulib modules.
Tests cover all items mentioned in Markus Kuhn's UTF-8 decoder page
(https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt).

2. cygwin/UTF-16 surrogates are handled transparently in 'mbbuffer'.
Applications under cygwin see 'ucs4_t' and don't need to worry about surrogates 
(but, wcwidth() will present some problem). Tests ensure parsing under cygwin 
behaves like other systems.

3. 'cut' supports multibyte '-c' and '-n -b' (but not multibyte '-d' yet).
Some tests included.


Comments welcomed,
 - assaf


Attachment: multibyte-2016-09-19.patch.xz
Description: Binary data

Reply via email to