Pádraig Brady <[email protected]> writes:

> The issue is that cut(1) does not support multi-byte characters yet,
> and is treating -c like -b.  This can cause cut(1) to
> output a partial multi-byte character. In your case,
> the following shows it starts outputting in the middle of the
> UTF-8 Narrow non-breaking space character:
>
>   LC_ALL=de_DE.UTF-8 git/coreutils/src/cut -c1-10 de.text |
>    head -n2115 | tail -n1 | od -Ax -tx1z -v
>   000000 33 31 30 30 e2 80 af c3 9c 62 0a                 >3100.....b.<
>
> This is already on our TODO list.

I haven't thought of a decent interface for multibyte characters that
behaves like getndelim2 yet, which is needed for 'cut'. Outside of that,
it should not be too difficult.

Collin



Reply via email to