On 3 December 2012 18:16, Glenn Fowler <[email protected]> wrote:
>
> On Mon, 3 Dec 2012 18:07:00 +0100 Cedric Blancher wrote:
>> On 20 November 2012 16:27, Glenn Fowler <[email protected]> wrote:
>> >
>> > On Tue, 20 Nov 2012 10:04:36 +0100 Cedric Blancher wrote:
>> >> On 17 November 2012 11:25, Roland Mainz <[email protected]> wrote:
>> >> > On Fri, Nov 16, 2012 at 6:00 PM, Roland Mainz 
>> >> > <[email protected]> wrote:
>> >> >> On Fri, Nov 16, 2012 at 5:57 PM, Roland Mainz 
>> >> >> <[email protected]> wrote:
>> >> >>> The following testcase (which should basically test whether the
>> >> >>> SystemV "tr" range expression [a-z] works with 'a' and 'z' replaced
>> >> >>> with \u[20a0] and \u[20af] ...) ...
>> >> >>> -- snip --
>> >> >>> $ ~/bin/ksh -x -c $'builtin tr ; tr -c
>> >> >>> $\'[:digit:][\u[20a0]-\u[20af]][:alpha:]\' "[\\n*]" <<<$\'hello
>> >> >>> chicken \u[20ac] world\' ; true'
>> >> >>> -- snip --
>> >> >>> ... should AFAIK print something like this:
>> >> >>> -- snip --
>> >> >>> + builtin tr
>> >> >>> + tr -c $'[:digit:][\u[20a0]-\u[20af]][:alpha:]' '[\n*]'
>> >> >>> + 0<<< hello chicken € world
>> >> >>> hello
>> >> >>> chicken
>> >> >>>
>> >> >>>
>> >> >>> world
>> >> >>> + true
>> >> >>>
>> >> >>> -- snip --
>> >> >>> ... but ast-ksh.2012-11-24 with Glenn's latest tr.c changes gives 
>> >> >>> this output:
>> >> >>> -- snip --
>> >> >>> + builtin tr
>> >> >>> + tr -c $'[:digit:][\u[20a0]-\u[20af]][:alpha:]' '[\n*]'
>> >> >>> + 0<<< hello chicken € world
>> >> >>> hello
>> >> >>> chicken
>> >> >>> €
>> >> >>> world
>> >> >>> + true
>> >> >>>
>> >> >>> -- snip --
>> >> >>>
>> >> >>> Erm... does anyone spot the mistake ? Or is this a AST "tr" bug ?
>> >> >>
>> >> >> BTW: It seems to work if I remove the leading [:digit:] expression:
>> >> >> -- snip --
>> >> >> $ ~/bin/ksh -x -c $'builtin tr ; tr -c
>> >> >> $\'[\u[20a0]-\u[20af]][:alpha:]\' "[\\n*]" <<<$\'hello chicken
>> >> >> \u[20ac] world\' ; true'
>> >> >> + builtin tr
>> >> >> + tr -c $'[\u[20a0]-\u[20af]][:alpha:]' '[\n*]'
>> >> >> + 0<<< hello chicken € world
>> >> >> hello
>> >> >> chicken
>> >> >> €
>> >> >> world
>> >> >> + true
>> >> >> -- snip --
>> >> >
>> >> > ... or if I put the [:digit:] at the end:
>> >> > -- snip --
>> >> > $ ~/bin/ksh -x -c $'builtin tr ; tr -c
>> >> > $\'[\u[20a0]-\u[20af]][:alpha:][:digit:]\' "[\\n*]" <<<$\'hello
>> >> > chicken 6a \u[20ac] world\' ; true'
>> >> > + builtin tr
>> >> > + tr -c $'[\u[20a0]-\u[20af]][:alpha:][:digit:]' '[\n*]'
>> >> > + 0<<< hello chicken 6a € world
>> >> > hello
>> >> > chicken
>> >> > 6a
>> >> > €
>> >> > world
>> >> > + true
>> >> > -- snip --
>> >> >
>> >> > ... erm... question for Glenn:
>> >> > Must range patterns (e.g. [a-z] or 'a' and 'z' replaced by Unicode
>> >> > characters) be sorted before character classes like [:digit:] or
>> >> > [:alpha:] (this may be a case where a --strict option should
>> >> > warn/complain if the arguments must be sorted) ?
>> >
>> >> The current implementation requires the argument to be sorted -
>> >> characters first, then ranges and finally character classes
>> >> ([:digit:]) - but I'm not seeing that the standard requires this.
>> >> Glenn, can you elaborate on this?
>> >
>> > the current implementation of ast tr?
>
>>  ./arch/linux.i386-64/bin/ksh -c 'builtin tr ; tr --version'
>>   version         tr (AT&T Research) 2012-11-12
>
>> Rephrasing my question:
>> 1. Does the standard, whatever it's name or version, require the tr
>> arguments to be sorted like regex arguments need to be sorted?
>> 2. Does the current AST tr implementation (tr (AT&T Research)
>> 2012-11-12) require the arguments to be sorted?
>
> right, that clarifies "current implementation"
>
> can you point to the text in the standard that
> "requires the argument to be sorted"
>
> ast tr does not require any specific ordering on the user's part
> but note that for -C the user and tr implementation are constrained by
> the collation order in the current locale whereby one command line
> could produce different results for each locale with a differing
> collation order
>
> I can't fathom reliable usage of -C in portable scripts

Can you fathom reliable usage of tr -C when the locale is using UTF-8
encoding and follows Unicode standard conventions, i.e. the Unicode
standard collation order?

Ced
-- 
Cedric Blancher <[email protected]>
Institute Pasteur
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to