bug#29606: Command 'fold' dangerous with utf-8 input

Mark Roberts Thu, 07 Dec 2017 09:37:40 -0800

Dear Assaf,

If you'd like to help us test these patches, please try
an unofficial development snapshot here:


https://files.housegordon.org/src/coreutils-multibyte-experimental-8.28.39-79242.tar.xz


I have taken a look and have an unexpected result:

fold (version 8.28.39-79242) reacts to my LANG envirionment variable,which is good, but it ignores the --bytes or -b flag, which is surprising.

My test case uses 'echo' to send the German sharp s character, which is atwo byte character, and a newline to 'fold --width 1'. I then use 'head-1' and 'wc --bytes' to count the bytes in line one.

If UTF-8 is set, this should strip off one character (two bytes) plus onenewline. It does.


If UTF-8 is not set, it should strip off one bytes and a newline. It does.

If 'fold --width 1 --bytes' is used, it should always strip off one byteand a newline, regardless of environment settings. It doesn't. The'--bytes' switch has no effect.


Here are the test cases (the new versions of core-utils are in src/):

export LANG=""
src/echo ß | src/fold --bytes --width 1 | src/head -1 | src/wc --bytes

This is correct: fold splits the line between the two bytes and puts anewline after each. Counting bytes in the first line gives 2, includingthe newline.

export LANG="de_DE.UTF-8"
src/echo ß | src/fold --bytes --width 1 | src/head -1 | src/wc --bytes

This is wrong: fold has kept both bytes of the character on line one,although fold --bytes --width 1 should split after one byte.

export LANG=""
src/echo ß | src/fold --width 1 | src/head -1 | src/wc --bytes

This is correct: without language setting fold treats each byte as acharacter.

export LANG="de_DE.UTF-8"
src/echo ß | src/fold --width 1 | src/head -1 | src/wc --bytes

3

This is correct: The two-byte character remains on line one.

Have I misunderstood what "fold --bytes" is supposed to mean? Or is thisan error?


All the best,
Mark

bug#29606: Command 'fold' dangerous with utf-8 input

Reply via email to