On Sun, 6 Jun 2021 20:18:00 +0000, Farley, Peter x23353 wrote:
>
>> Hmmm.  In my desktop Linux, in sed regexen, both BRE and ERE, /./ matches a 
>> character.  In awk /./ matches an octet.  Grrr.
>> And printf field width specifications seem to assume octets, not characters.
>
>Both awk and the usual linux implementation (gawk) have no capability to 
>transparently process MBCS characters and have never had that capability.  
>They deal with 8-bit bytes only.
>
Just to be incompatible with sed.
(But how should length(), RSTART, RLENGTH, substr(), printf(), ... work?)
I would hardly have a problem with a rule that the result of processing
a purported text file with content inconsistent with LC_CTYPE is
undefined.

Even as I used MacOS for many years before stumbling into the
(undocumented?)  restriction that pathnames must be valid UTF-8
strings, apparently regardless  of locale.

>Numerous queries onhthe gawk-debug mailing list attest to that.
>
I don't subscribe.  I'll take your word for it.

Thanks,
gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to