Pádraig Brady <[email protected]> writes: > The issue is that cut(1) does not support multi-byte characters yet, > and is treating -c like -b. This can cause cut(1) to > output a partial multi-byte character. In your case, > the following shows it starts outputting in the middle of the > UTF-8 Narrow non-breaking space character: > > LC_ALL=de_DE.UTF-8 git/coreutils/src/cut -c1-10 de.text | > head -n2115 | tail -n1 | od -Ax -tx1z -v > 000000 33 31 30 30 e2 80 af c3 9c 62 0a >3100.....b.< > > This is already on our TODO list.
I haven't thought of a decent interface for multibyte characters that behaves like getndelim2 yet, which is needed for 'cut'. Outside of that, it should not be too difficult. Collin
