On 2 September 2013 17:11, Wendy Lin <[email protected]> wrote: > On 2 September 2013 05:50, Roland Mainz <[email protected]> wrote: >> On Mon, Sep 2, 2013 at 1:42 AM, Roland Mainz <[email protected]> >> wrote: >>> On Mon, Sep 2, 2013 at 1:09 AM, Roland Mainz <[email protected]> >>> wrote: >>>> On Mon, Sep 2, 2013 at 1:06 AM, Roland Mainz <[email protected]> >>>> wrote: >>>>> On Mon, Sep 2, 2013 at 12:36 AM, Roland Mainz <[email protected]> >>>>> wrote: >>>>>> On Mon, Aug 5, 2013 at 5:01 AM, Roland Mainz <[email protected]> >>>>>> wrote: >>>>>>> On Mon, Aug 5, 2013 at 4:13 AM, Roland Mainz <[email protected]> >>>>>>> wrote: >>>> [snip] >>>>> ** More notes: >>>>> 1. $ ksh -c 'export LC_ALL=en_US.ISO8859-15 ; printf "x\u[20ac]x\n" | >>>>> iconv -f ISO8859-15 -t UTF-8' # now works... it the correct outpput is >>>>> "x€x" >>>>> 2. The reason why this didn't work in the *002* patch was that the >>>>> original code in ast-ksh.2013-08-29 used |wc2utf8()| on an "extended >>>>> single-byte locale" like "en_US.ISO8859-15" ... this can **never** >>>>> work because the locale is not UTF-8 based >>>>> >>>>> Glenn/David: What do you think about the patch ? >>>> >>>> I forgot one note: >>>> - The patch _explicitly_ uses |iconv()| even for UTF-8 locales to >>>> weed-out unassigned codepoints to fullfit the unicode requirement that >>>> no unassigned codepoints should be accessible. >>> >>> Last updated patch for tonight: >>> >>> Attached (as "astksh20130829_printf_w_gb18030_004.diff.txt") is an >>> updated version of the patch which now automagically uses "\u[hex]" as >>> output instead of "\w[hex]" for UTF-8 locales, making the output 100% >>> compatible to previous ksh93 versions except for the describes bugs in >>> those versions. >>> >>> BTW: Some example usage for $ set -o convunicode # (byte "a4" is the >>> Euro character in ISO8859-15): >>> -- snip -- >>> $ ksh -c 'export LC_ALL=en_US.ISO8859-15 ; printf "euro=|%q|\n" >>> "$(printf "\xa4")" | iconv -f ISO8859-15 -t UTF-8' >>> euro=|$'€'| >>> $ ksh -o convunicode -c 'export LC_ALL=en_US.ISO8859-15 ; printf >>> "euro=|%q|\n" "$(printf "\xa4")" | iconv -f ISO8859-15 -t UTF-8' >>> euro=|$'\u[20ac]'| >>> -- snip -- >>> >>> Comments/rants/etc. welcome... >>> >>> ... and David/Glenn: Please don't remove the comments in the code if >>> you take the patch... there's a reason why I'm quite verbose in the >>> comments (short: Hideously complex and lots of traps in the code) ... >> >> Attached (as "astksh20130829_printf_w_gb18030_005.diff.txt") is a >> fixed patch... the previous one missed a |continue;| statement which >> caused failures in the "locale.sh" test module (found by Wang Shouhua) >> ... > > Roland, thank you very much for the patch. I've been testing it the > last couple of hours in both Japanese and Chinese environments and > have to say: I am impressed. I can now address individual characters > just by their hexadecimal Unicode value, and it works in any locale. > This improves portability a lot and brings ksh93 in parity with perl.
I think its even more impressive that \u[] can now be used in en_GB.iso885915 (which is a singlebyte locale) to pick characters if supported. The *shame* is that singlebyte locales like en_GB.iso885915 were broken for such a long time. Kudos to Roland Mainz for fixing the problems. Lionel _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
