Hi Assaf!
One thing I noticed is that the tests fail on my computer.
> I see things like:
> ====
> cut-multibyte.pl: test mbd-newline-24: stdout mismatch, comparing
> mbd-newline-24.2 (expected) and mbd-newline-24.O (actual)
> *** mbd-newline-24.2 Mon Sep 18 20:02:46 2017
> --- mbd-newline-24.O Mon Sep 18 20:02:46 2017
> ***************
> *** 1 ****
> ! aꝤb
> --- 1 ----
> ! a$Ꝥb
> ====
> It is the extra dollar sign before the multibyte character which hints
> to me it is related to the interaction between Perl
> (which converts \xNN sequences) and the shell command line
> (where you've used the $'\xNN' syntax).
>
> The test was:
> ['mbd-newline-24', "-d'\n'", '-f1,2', "--ou=\$'\xEA\x9D\xA4'",
> {IN=>"a\nb\n"}, {OUT=>"a\xEA\x9D\xA4b\n"}],
>
>
Thanks for testing! I was able to reproduce it and it should be just fine
with -d'\x{NN}'
mentioned bellow, which I used.> Also, > I'm not sure if coreutils currently allows the newer $'\xNN' construct > in tests - this might be too new to be supported everywhere (comments, > anyone? I'll also try to look for them in other tests). > > In any case, Perl itself can easily generate UTF-8 characters and send > them as-is to the program being tested, I think that will suffice. > > > Planning ahead, since this is going to be a large addition, > we'll need to ask you for copyright assignment for your code contributions. > > You can read more about it here: > https://www.gnu.org/licenses/why-assign.en.html > https://www.fsf.org/licensing/assigning.html > > To begin the process, please fill the information here: > https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/ > Copyright/request-assign.future > > and send it to [email protected] . > Thanks, I will. Attached patch with fixed tests. (I should probably add even more tests anyway) Sebastián.
cut-multibyte-delimiter.tar.gz
Description: GNU Zip compressed data
