On Sun, 6 Jun 2021 20:18:00 +0000, Farley, Peter x23353 wrote: > >> Hmmm. In my desktop Linux, in sed regexen, both BRE and ERE, /./ matches a >> character. In awk /./ matches an octet. Grrr. >> And printf field width specifications seem to assume octets, not characters. > >Both awk and the usual linux implementation (gawk) have no capability to >transparently process MBCS characters and have never had that capability. >They deal with 8-bit bytes only. > Just to be incompatible with sed. (But how should length(), RSTART, RLENGTH, substr(), printf(), ... work?) I would hardly have a problem with a rule that the result of processing a purported text file with content inconsistent with LC_CTYPE is undefined.
Even as I used MacOS for many years before stumbling into the (undocumented?) restriction that pathnames must be valid UTF-8 strings, apparently regardless of locale. >Numerous queries onhthe gawk-debug mailing list attest to that. > I don't subscribe. I'll take your word for it. Thanks, gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
