On 2 September 2013 05:50, Roland Mainz <[email protected]> wrote: > On Mon, Sep 2, 2013 at 1:42 AM, Roland Mainz <[email protected]> wrote: >> On Mon, Sep 2, 2013 at 1:09 AM, Roland Mainz <[email protected]> >> wrote: >>> On Mon, Sep 2, 2013 at 1:06 AM, Roland Mainz <[email protected]> >>> wrote: >>>> On Mon, Sep 2, 2013 at 12:36 AM, Roland Mainz <[email protected]> >>>> wrote: >>>>> On Mon, Aug 5, 2013 at 5:01 AM, Roland Mainz <[email protected]> >>>>> wrote: >>>>>> On Mon, Aug 5, 2013 at 4:13 AM, Roland Mainz <[email protected]> >>>>>> wrote: >>> [snip] >>>> ** More notes: >>>> 1. $ ksh -c 'export LC_ALL=en_US.ISO8859-15 ; printf "x\u[20ac]x\n" | >>>> iconv -f ISO8859-15 -t UTF-8' # now works... it the correct outpput is >>>> "x€x" >>>> 2. The reason why this didn't work in the *002* patch was that the >>>> original code in ast-ksh.2013-08-29 used |wc2utf8()| on an "extended >>>> single-byte locale" like "en_US.ISO8859-15" ... this can **never** >>>> work because the locale is not UTF-8 based >>>> >>>> Glenn/David: What do you think about the patch ? >>> >>> I forgot one note: >>> - The patch _explicitly_ uses |iconv()| even for UTF-8 locales to >>> weed-out unassigned codepoints to fullfit the unicode requirement that >>> no unassigned codepoints should be accessible. >> >> Last updated patch for tonight: >> >> Attached (as "astksh20130829_printf_w_gb18030_004.diff.txt") is an >> updated version of the patch which now automagically uses "\u[hex]" as >> output instead of "\w[hex]" for UTF-8 locales, making the output 100% >> compatible to previous ksh93 versions except for the describes bugs in >> those versions. >> >> BTW: Some example usage for $ set -o convunicode # (byte "a4" is the >> Euro character in ISO8859-15): >> -- snip -- >> $ ksh -c 'export LC_ALL=en_US.ISO8859-15 ; printf "euro=|%q|\n" >> "$(printf "\xa4")" | iconv -f ISO8859-15 -t UTF-8' >> euro=|$'€'| >> $ ksh -o convunicode -c 'export LC_ALL=en_US.ISO8859-15 ; printf >> "euro=|%q|\n" "$(printf "\xa4")" | iconv -f ISO8859-15 -t UTF-8' >> euro=|$'\u[20ac]'| >> -- snip -- >> >> Comments/rants/etc. welcome... >> >> ... and David/Glenn: Please don't remove the comments in the code if >> you take the patch... there's a reason why I'm quite verbose in the >> comments (short: Hideously complex and lots of traps in the code) ... > > Attached (as "astksh20130829_printf_w_gb18030_005.diff.txt") is a > fixed patch... the previous one missed a |continue;| statement which > caused failures in the "locale.sh" test module (found by Wang Shouhua) > ...
Roland, thank you very much for the patch. I've been testing it the last couple of hours in both Japanese and Chinese environments and have to say: I am impressed. I can now address individual characters just by their hexadecimal Unicode value, and it works in any locale. This improves portability a lot and brings ksh93 in parity with perl. Thanks Wendy _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
