I think you are testing too many things at once
do the basics first and then compose
assume a UTF-8 locale
$'\u[20ac]' is the unicode euro character
use standard range expressions a-z, not sysV [a-z]
(this way you can test other standard tr implementations)
map to [X*] instead of [\n*] to be easy on the eyes
first, -c in C or mb locales uses the code point sorting order
the way that is implemented is
(1) set1 is parsed completely setting a bit for each selected code point
(2) the -c complement sets up a new table indexed by code point
wchar_t ordered_set1[max_code_point];
for (c = n = 0; c < max_code_point; c++)
if (!in_set_1(c))
ordered_set1[n++] = c;
(3) if -C were specified instead then ordered_set1[] would be sorted
according to the LC_COLLATE locale setting
(4) ordered_set1[[] is then used to map 1-1 into set2[] which
is ordered left-to-right, e.g., the l-r order specified on the command line
(5) this means that for -c and -C the specification order for set1 does not
matter
to avoid output with no trailing newline \n is always added to set1 for -c/-C
here's a start for some tests in regress(1) form
copy to tr.tst and run
regress tr.tst
or to test other tr implementations
regress tr.tst /usr/xpg1234/bin/tr
now when the discussion ends we'll have a regression test to add to the packages
--
UNIT tr
TEST 01 'multibyte exercises'
EXPORT LC_CTYPE=en_US.UTF-8
EXEC $'\u[20ac]' '[X*]'
INPUT - $'\u[20ac]'
OUTPUT - $'X'
EXEC $'\u[20a0]-\u[20af]' '[X*]'
EXEC -c $'\u[20ac]\n' '[X*]'
OUTPUT - $'\u[20ac]'
EXEC -c $'\u[20a0]-\u[20af]\n' '[X*]'
--
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers