On 3 December 2012 18:16, Glenn Fowler <[email protected]> wrote: > > On Mon, 3 Dec 2012 18:07:00 +0100 Cedric Blancher wrote: >> On 20 November 2012 16:27, Glenn Fowler <[email protected]> wrote: >> > >> > On Tue, 20 Nov 2012 10:04:36 +0100 Cedric Blancher wrote: >> >> On 17 November 2012 11:25, Roland Mainz <[email protected]> wrote: >> >> > On Fri, Nov 16, 2012 at 6:00 PM, Roland Mainz >> >> > <[email protected]> wrote: >> >> >> On Fri, Nov 16, 2012 at 5:57 PM, Roland Mainz >> >> >> <[email protected]> wrote: >> >> >>> The following testcase (which should basically test whether the >> >> >>> SystemV "tr" range expression [a-z] works with 'a' and 'z' replaced >> >> >>> with \u[20a0] and \u[20af] ...) ... >> >> >>> -- snip -- >> >> >>> $ ~/bin/ksh -x -c $'builtin tr ; tr -c >> >> >>> $\'[:digit:][\u[20a0]-\u[20af]][:alpha:]\' "[\\n*]" <<<$\'hello >> >> >>> chicken \u[20ac] world\' ; true' >> >> >>> -- snip -- >> >> >>> ... should AFAIK print something like this: >> >> >>> -- snip -- >> >> >>> + builtin tr >> >> >>> + tr -c $'[:digit:][\u[20a0]-\u[20af]][:alpha:]' '[\n*]' >> >> >>> + 0<<< hello chicken € world >> >> >>> hello >> >> >>> chicken >> >> >>> >> >> >>> >> >> >>> world >> >> >>> + true >> >> >>> >> >> >>> -- snip -- >> >> >>> ... but ast-ksh.2012-11-24 with Glenn's latest tr.c changes gives >> >> >>> this output: >> >> >>> -- snip -- >> >> >>> + builtin tr >> >> >>> + tr -c $'[:digit:][\u[20a0]-\u[20af]][:alpha:]' '[\n*]' >> >> >>> + 0<<< hello chicken € world >> >> >>> hello >> >> >>> chicken >> >> >>> € >> >> >>> world >> >> >>> + true >> >> >>> >> >> >>> -- snip -- >> >> >>> >> >> >>> Erm... does anyone spot the mistake ? Or is this a AST "tr" bug ? >> >> >> >> >> >> BTW: It seems to work if I remove the leading [:digit:] expression: >> >> >> -- snip -- >> >> >> $ ~/bin/ksh -x -c $'builtin tr ; tr -c >> >> >> $\'[\u[20a0]-\u[20af]][:alpha:]\' "[\\n*]" <<<$\'hello chicken >> >> >> \u[20ac] world\' ; true' >> >> >> + builtin tr >> >> >> + tr -c $'[\u[20a0]-\u[20af]][:alpha:]' '[\n*]' >> >> >> + 0<<< hello chicken € world >> >> >> hello >> >> >> chicken >> >> >> € >> >> >> world >> >> >> + true >> >> >> -- snip -- >> >> > >> >> > ... or if I put the [:digit:] at the end: >> >> > -- snip -- >> >> > $ ~/bin/ksh -x -c $'builtin tr ; tr -c >> >> > $\'[\u[20a0]-\u[20af]][:alpha:][:digit:]\' "[\\n*]" <<<$\'hello >> >> > chicken 6a \u[20ac] world\' ; true' >> >> > + builtin tr >> >> > + tr -c $'[\u[20a0]-\u[20af]][:alpha:][:digit:]' '[\n*]' >> >> > + 0<<< hello chicken 6a € world >> >> > hello >> >> > chicken >> >> > 6a >> >> > € >> >> > world >> >> > + true >> >> > -- snip -- >> >> > >> >> > ... erm... question for Glenn: >> >> > Must range patterns (e.g. [a-z] or 'a' and 'z' replaced by Unicode >> >> > characters) be sorted before character classes like [:digit:] or >> >> > [:alpha:] (this may be a case where a --strict option should >> >> > warn/complain if the arguments must be sorted) ? >> > >> >> The current implementation requires the argument to be sorted - >> >> characters first, then ranges and finally character classes >> >> ([:digit:]) - but I'm not seeing that the standard requires this. >> >> Glenn, can you elaborate on this? >> > >> > the current implementation of ast tr? > >> ./arch/linux.i386-64/bin/ksh -c 'builtin tr ; tr --version' >> version tr (AT&T Research) 2012-11-12 > >> Rephrasing my question: >> 1. Does the standard, whatever it's name or version, require the tr >> arguments to be sorted like regex arguments need to be sorted? >> 2. Does the current AST tr implementation (tr (AT&T Research) >> 2012-11-12) require the arguments to be sorted? > > right, that clarifies "current implementation" > > can you point to the text in the standard that > "requires the argument to be sorted" > > ast tr does not require any specific ordering on the user's part > but note that for -C the user and tr implementation are constrained by > the collation order in the current locale whereby one command line > could produce different results for each locale with a differing > collation order > > I can't fathom reliable usage of -C in portable scripts
Can you fathom reliable usage of tr -C when the locale is using UTF-8 encoding and follows Unicode standard conventions, i.e. the Unicode standard collation order? Ced -- Cedric Blancher <[email protected]> Institute Pasteur _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
