On 17 November 2012 11:25, Roland Mainz <[email protected]> wrote: > On Fri, Nov 16, 2012 at 6:00 PM, Roland Mainz <[email protected]> > wrote: >> On Fri, Nov 16, 2012 at 5:57 PM, Roland Mainz <[email protected]> >> wrote: >>> The following testcase (which should basically test whether the >>> SystemV "tr" range expression [a-z] works with 'a' and 'z' replaced >>> with \u[20a0] and \u[20af] ...) ... >>> -- snip -- >>> $ ~/bin/ksh -x -c $'builtin tr ; tr -c >>> $\'[:digit:][\u[20a0]-\u[20af]][:alpha:]\' "[\\n*]" <<<$\'hello >>> chicken \u[20ac] world\' ; true' >>> -- snip -- >>> ... should AFAIK print something like this: >>> -- snip -- >>> + builtin tr >>> + tr -c $'[:digit:][\u[20a0]-\u[20af]][:alpha:]' '[\n*]' >>> + 0<<< hello chicken € world >>> hello >>> chicken >>> >>> >>> world >>> + true >>> >>> -- snip -- >>> ... but ast-ksh.2012-11-24 with Glenn's latest tr.c changes gives this >>> output: >>> -- snip -- >>> + builtin tr >>> + tr -c $'[:digit:][\u[20a0]-\u[20af]][:alpha:]' '[\n*]' >>> + 0<<< hello chicken € world >>> hello >>> chicken >>> € >>> world >>> + true >>> >>> -- snip -- >>> >>> Erm... does anyone spot the mistake ? Or is this a AST "tr" bug ? >> >> BTW: It seems to work if I remove the leading [:digit:] expression: >> -- snip -- >> $ ~/bin/ksh -x -c $'builtin tr ; tr -c >> $\'[\u[20a0]-\u[20af]][:alpha:]\' "[\\n*]" <<<$\'hello chicken >> \u[20ac] world\' ; true' >> + builtin tr >> + tr -c $'[\u[20a0]-\u[20af]][:alpha:]' '[\n*]' >> + 0<<< hello chicken € world >> hello >> chicken >> € >> world >> + true >> -- snip -- > > ... or if I put the [:digit:] at the end: > -- snip -- > $ ~/bin/ksh -x -c $'builtin tr ; tr -c > $\'[\u[20a0]-\u[20af]][:alpha:][:digit:]\' "[\\n*]" <<<$\'hello > chicken 6a \u[20ac] world\' ; true' > + builtin tr > + tr -c $'[\u[20a0]-\u[20af]][:alpha:][:digit:]' '[\n*]' > + 0<<< hello chicken 6a € world > hello > chicken > 6a > € > world > + true > -- snip -- > > ... erm... question for Glenn: > Must range patterns (e.g. [a-z] or 'a' and 'z' replaced by Unicode > characters) be sorted before character classes like [:digit:] or > [:alpha:] (this may be a case where a --strict option should > warn/complain if the arguments must be sorted) ?
The current implementation requires the argument to be sorted - characters first, then ranges and finally character classes ([:digit:]) - but I'm not seeing that the standard requires this. Glenn, can you elaborate on this? Ced -- Cedric Blancher <[email protected]> Institute Pasteur _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
