On Fri, Mar 15, 2013 at 5:27 PM, Dr. Werner Fink <[email protected]> wrote:
> On Fri, Mar 15, 2013 at 03:57:20PM +0100, Cedric Blancher wrote:
>> On 14 March 2013 23:01, Roland Mainz <[email protected]> wrote:
>> > On Thu, Mar 14, 2013 at 2:19 PM, Cedric Blancher
>> > <[email protected]> wrote:
>> >> How do I match accented e (i.e. é) using an equivalence class in AST tr?
>> >>
>> >> Doing that in sed is easy:
>> >> ~/bin/sed -r "s/[[=e=]]/X/g" <<<"8é8" ; printf "\n"
>> >> 8X8
>> >>
>> >> But in tr I am not able to get it working:
>> >> ksh -c 'builtin tr ; tr -Cd "[=e=]" <<<"1e2é3" ; print'
>> >> e
>> >>
>> >> AFAIK this should print "eé".
>> >>
>> >> I used:
>> >>   version         tr (AT&T Research) 2012-11-12
>> >>   version         sed (AT&T Research) 2012-03-28
>> >
>> > Erm... wIthout digging around... does AST "tr" support the POSIX
>> > equivalence class syntax yet (Glenn... ping!) ? My first guess would
>> > be to try another platform like Solaris to see if the issue is
>> > libc-related...
>>
>> Glenn, does AST tr support the [=e=] syntax?
>> Werner, does GNU tr support the [=e=] syntax?
>
> The manual page or tr says:
>
>
>        [=CHAR=]
>               all characters which are equivalent to CHAR
>
> ... nevertheless
>
>   werner@noether:~> echo $LANG
>   POSIX
>   werner@noether:~> tr -Cd "[=a=]" <<<"1e2b3a"; echo
>   a
>   werner@noether:~> tr -d "[=a=]" <<<"1e2b3a"
>   1e2b3
>   werner@noether:~> tr -d "[:alpha:]" <<<"1e2b3a"
>   123
>   werner@noether:~> LANG=fr_FR.UTF-8
>   werner@noether:~> tr -Cd "[=e=]" <<<"1e2é3a"; echo
>   e
>   werner@noether:~> tr -d "[=e=]" <<<"1e2é3a"
>   12é3a
>   werner@noether:~> tr -d "[:alpha:]" <<<"1e2é3a"
>   12é3
>   werner@fatou:~> tr -Cs "[=e=]" '[\n*]' <<<"1e2é3a"
>
>   e
>   werner@fatou:~> tr -s "[=e=]" '[\n*]' <<<"1e2é3a"
>   1
>   2é3a
>
>
> ... it seems that multibyte may cause problems as well
> as equivalent classes.  The tr is from GNU coreutils 8.17.

Maybe AST tr uses the wrong libast regex function?
I noticed this:
/usr/ast/bin/sed -E "s/[=e=]/X/g" <<<"1e2é3ae4"
1X2é3aX4
/usr/ast/bin/sed -E "s/[[=e=]]/X/g" <<<"1e2é3ae4"
1X2X3aX4

In the first sed example é is not matched by [=e=] but the second
matches it with [[=e=]]. Maybe AST tr must call the regex function for
[[=e=]] and not [=e=]?

Simon
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to