On 17/07/16 02:52, Assaf Gordon wrote: > Hello, > >> On Jun 27, 2016, at 06:56, Pádraig Brady <[email protected]> wrote: >> >> On 27/06/16 06:17, Assaf Gordon wrote: >>> Hello Pádraig and all, >>> >>>> On Jun 25, 2016, at 07:20, Pádraig Brady <[email protected]> wrote: >>>> >>>> As part of this, or at least before looking at multibyte changes, >>>> it would be worth considering this proposal for changing the >>>> unexpand algorithm: http://bugs.gnu.org/23335 >>> >>> The above bug-report addresses this TODO item: >>> === >>> unexpand: [http://www.opengroup.org/onlinepubs/007908799/xcu/unexpand.html] >>> printf 'x\t \t y\n'|unexpand -t 8,9 should print its input, unmodified. >>> printf 'x\t \t y\n'|unexpand -t 5,8 should print "x\ty\n" >>> === >> >> I think the second command is wrong there actually? >> Surely it should print "x\t\t y\n" > > Digging a bit deeper about various 'unexpand' implementation, it seems there > are more differences. > Attached is a summary of most of coreutil's unexpand tests on various systems. > The trivial cases give the same results, but more tricky cases (e.g. the > 'blanks' and 'posix' tests) do differ. > > The test script is here: http://files.housegordon.org/tmp/test-unexpand-2.sh > (the last 'ff' octet for AIX can be ignored, I suspect a bug in AIX's > unexpand when lines are not '\n' terminated). > > Example (the inputs are 'blank-1' and 'blank-11' from > <coreutils>/tests/misc/unexpand.pl): > > blanks-1 AIX-1 09 62 09 09 63 09 09 09 64 > blanks-1 Darwin-14.4.0 20 62 09 20 63 09 09 20 64 > blanks-1 FreeBSD-10.1-RELEASE 20 62 09 20 63 09 09 20 64 > blanks-1 Linux-3.16.0-4-amd64 09 62 09 09 63 09 09 09 64 > blanks-1 SunOS-5.11 20 62 20 20 63 20 20 20 64 > > blanks-11 AIX-1 09 09 34 > blanks-11 Darwin-14.4.0 09 34 > blanks-11 FreeBSD-10.1-RELEASE 09 34 > blanks-11 Linux-3.16.0-4-amd64 09 09 34 > blanks-11 SunOS-5.11 09 20 34 > > > And so I wonder if it's best to leave unexpand's algorithm as-is, for the > sake of backwards-compatability (if someone is expecting coreutils' expected > behavior), > and then focus back on multibyte character processing in 'expand' (with or > without using the refactoring patches).
I think the existing algorithm is fine, and the refactoring patch should go in now. We should move the two items from TODO to tests though, to record this investigation. # comment that this should arguably minimize translation # as is done on Solaris, and not modify input, but at least # verify prints "x\t\t\t y\n" printf 'x\t \t y\n'|unexpand -t 8,9 # verify prints "x\t\t y\n" printf 'x\t \t y\n'|unexpand -t 5,8 That with the previous 'extern' patch adjustment I sent and it's good to push. thanks! Pádraig
