On Fri, Mar 22, 2013 at 4:28 AM, Simon Toedt <[email protected]> wrote:
> On Fri, Mar 15, 2013 at 5:27 PM, Dr. Werner Fink <[email protected]> wrote:
>> On Fri, Mar 15, 2013 at 03:57:20PM +0100, Cedric Blancher wrote:
>>> On 14 March 2013 23:01, Roland Mainz <[email protected]> wrote:
>>> > On Thu, Mar 14, 2013 at 2:19 PM, Cedric Blancher
>>> > <[email protected]> wrote:
>>> >> How do I match accented e (i.e. é) using an equivalence class in AST tr?
>>> >>
>>> >> Doing that in sed is easy:
>>> >> ~/bin/sed -r "s/[[=e=]]/X/g" <<<"8é8" ; printf "\n"
>>> >> 8X8
>>> >>
>>> >> But in tr I am not able to get it working:
>>> >> ksh -c 'builtin tr ; tr -Cd "[=e=]" <<<"1e2é3" ; print'
>>> >> e
>>> >>
>>> >> AFAIK this should print "eé".
>>> >>
>>> >> I used:
>>> >>   version         tr (AT&T Research) 2012-11-12
>>> >>   version         sed (AT&T Research) 2012-03-28
>>> >
>>> > Erm... wIthout digging around... does AST "tr" support the POSIX
>>> > equivalence class syntax yet (Glenn... ping!) ? My first guess would
>>> > be to try another platform like Solaris to see if the issue is
>>> > libc-related...
>>>
>>> Glenn, does AST tr support the [=e=] syntax?
>>> Werner, does GNU tr support the [=e=] syntax?
>>
>> The manual page or tr says:
>>
>>
>>        [=CHAR=]
>>               all characters which are equivalent to CHAR
>>
>> ... nevertheless
>>
>>   werner@noether:~> echo $LANG
>>   POSIX
>>   werner@noether:~> tr -Cd "[=a=]" <<<"1e2b3a"; echo
>>   a
>>   werner@noether:~> tr -d "[=a=]" <<<"1e2b3a"
>>   1e2b3
>>   werner@noether:~> tr -d "[:alpha:]" <<<"1e2b3a"
>>   123
>>   werner@noether:~> LANG=fr_FR.UTF-8
>>   werner@noether:~> tr -Cd "[=e=]" <<<"1e2é3a"; echo
>>   e
>>   werner@noether:~> tr -d "[=e=]" <<<"1e2é3a"
>>   12é3a
>>   werner@noether:~> tr -d "[:alpha:]" <<<"1e2é3a"
>>   12é3
>>   werner@fatou:~> tr -Cs "[=e=]" '[\n*]' <<<"1e2é3a"
>>
>>   e
>>   werner@fatou:~> tr -s "[=e=]" '[\n*]' <<<"1e2é3a"
>>   1
>>   2é3a
>>
>>
>> ... it seems that multibyte may cause problems as well
>> as equivalent classes.  The tr is from GNU coreutils 8.17.
>
> Maybe AST tr uses the wrong libast regex function?
> I noticed this:
> /usr/ast/bin/sed -E "s/[=e=]/X/g" <<<"1e2é3ae4"
> 1X2é3aX4
> /usr/ast/bin/sed -E "s/[[=e=]]/X/g" <<<"1e2é3ae4"
> 1X2X3aX4
>
> In the first sed example é is not matched by [=e=] but the second
> matches it with [[=e=]]. Maybe AST tr must call the regex function for
> [[=e=]] and not [=e=]?

Glenn and Cedric, what do you think?

Simon
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to