in multibytes locales the sort key data coders use strxfrm() to convert data 
for comparison
strxfrm'd data can be bytewise compared -- this also limits the number of 
mutibyte calls
used in comparisons because each key may be subject to more than one comparison

use this script to see what is happening
I eliminated the ascii codes because of display and/or printf conflicts with 
some single byte chars
(like % and isspace chars)

x3 will have just the unicode hex codes in the locale-specific collation order
on my linux box for the locale below x3 (multibyte local) order != x1 
(bytewise) order
--
export LC_ALL=en_US.UTF-8

typeset -li16 i

for (( i=0x0100 ; i < 0x3000 ; i++ ))
do
        printf "\u[${i#16#}]\t- %04x\n" $i
done > x1

sort < x1 > x2

sed 's/.*- //' < x2 > x3
--

On Thu, 14 Mar 2013 11:39:07 +0100 Cedric Blancher wrote:
> Does AST sort on Linux support sorting of multibyte characters on Linux?
> I tried to sort multibyte characters like in the test script below:
> ----------cut----------
> % cat genunicodelist.sh
> typeset -li16 i

> rm x1 x2

> for (( i=1 ; i < 0x3000 ; i++ )) ; do
>       printf "\u[${i/~(El)16#/}]\t- %s\n" "$i"
> done >x1

> ~/bin/sort <x1 >x2
> ----------cut----------

> if I look at the contents of file x2 I see that the sorting appears to
> be based on the byte values:
> ----------cut----------
> ჽ       - 16#10fd
> ჾ       - 16#10fe
> ჿ       - 16#10ff
>         - 16#11
> ᄀ      - 16#1100
> ᄁ      - 16#1101
> ᄂ      - 16#1102
> ᄃ      - 16#1103
> ᄄ      - 16#1104
> ᄅ      - 16#1105
> ᄆ      - 16#1106
> ᄇ      - 16#1107
> ----------cut----------

> Is this a bug in AST sort or Linux? We are using opensuse 12.2 on a
> lenovo thinkpad and XENON servers.

> AST sort version:
> sort --version
>   version         sort (AT&T Research) 2010-08-11

> Ced
> -- 
> Cedric Blancher <[email protected]>
> Institute Pasteur
> _______________________________________________
> ast-developers mailing list
> [email protected]
> http://lists.research.att.com/mailman/listinfo/ast-developers

_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to