Re: [ast-developers] [patch] Updated $'\w[hex]' patch for GB18030&&co. ... / was: Re: [patch] Accessing widechar codepoints without unicode (GB18030-related) ...

Wendy Lin Mon, 02 Sep 2013 08:11:59 -0700

On 2 September 2013 05:50, Roland Mainz <[email protected]> wrote:
> On Mon, Sep 2, 2013 at 1:42 AM, Roland Mainz <[email protected]> wrote:
>> On Mon, Sep 2, 2013 at 1:09 AM, Roland Mainz <[email protected]> 
>> wrote:
>>> On Mon, Sep 2, 2013 at 1:06 AM, Roland Mainz <[email protected]> 
>>> wrote:
>>>> On Mon, Sep 2, 2013 at 12:36 AM, Roland Mainz <[email protected]> 
>>>> wrote:
>>>>> On Mon, Aug 5, 2013 at 5:01 AM, Roland Mainz <[email protected]> 
>>>>> wrote:
>>>>>> On Mon, Aug 5, 2013 at 4:13 AM, Roland Mainz <[email protected]> 
>>>>>> wrote:
>>> [snip]
>>>> ** More notes:
>>>> 1. $ ksh -c 'export LC_ALL=en_US.ISO8859-15 ; printf "x\u[20ac]x\n" |
>>>> iconv -f ISO8859-15 -t UTF-8' # now works... it the correct outpput is
>>>> "x€x"
>>>> 2. The reason why this didn't work in the *002* patch was that the
>>>> original code in ast-ksh.2013-08-29 used |wc2utf8()| on an "extended
>>>> single-byte locale" like "en_US.ISO8859-15" ... this can **never**
>>>> work because the locale is not UTF-8 based
>>>>
>>>> Glenn/David: What do you think about the patch ?
>>>
>>> I forgot one note:
>>> - The patch _explicitly_ uses |iconv()| even for UTF-8 locales to
>>> weed-out unassigned codepoints to fullfit the unicode requirement that
>>> no unassigned codepoints should be accessible.
>>
>> Last updated patch for tonight:
>>
>> Attached (as "astksh20130829_printf_w_gb18030_004.diff.txt") is an
>> updated version of the patch which now automagically uses "\u[hex]" as
>> output instead of "\w[hex]" for UTF-8 locales, making the output 100%
>> compatible to previous ksh93 versions except for the describes bugs in
>> those versions.
>>
>> BTW: Some example usage for $ set -o convunicode # (byte "a4" is the
>> Euro character in ISO8859-15):
>> -- snip --
>> $ ksh -c 'export LC_ALL=en_US.ISO8859-15 ; printf "euro=|%q|\n"
>> "$(printf "\xa4")" | iconv -f ISO8859-15 -t UTF-8'
>> euro=|$'€'|
>> $ ksh -o convunicode -c 'export LC_ALL=en_US.ISO8859-15 ; printf
>> "euro=|%q|\n" "$(printf "\xa4")" | iconv -f ISO8859-15 -t UTF-8'
>> euro=|$'\u[20ac]'|
>> -- snip --
>>
>> Comments/rants/etc. welcome...
>>
>> ... and David/Glenn: Please don't remove the comments in the code if
>> you take the patch... there's a reason why I'm quite verbose in the
>> comments (short: Hideously complex and lots of traps in the code) ...
>
> Attached (as "astksh20130829_printf_w_gb18030_005.diff.txt") is a
> fixed patch... the previous one missed a |continue;| statement which
> caused failures in the "locale.sh" test module (found by Wang Shouhua)
> ...


Roland, thank you very much for the patch. I've been testing it the
last couple of hours in both Japanese and Chinese environments and
have to say: I am impressed. I can now address individual characters
just by their hexadecimal Unicode value, and it works in any locale.
This improves portability a lot and brings ksh93 in parity with perl.

Thanks

Wendy
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Re: [ast-developers] [patch] Updated $'\w[hex]' patch for GB18030&&co. ... / was: Re: [patch] Accessing widechar codepoints without unicode (GB18030-related) ...

Reply via email to