in multibytes locales the sort key data coders use strxfrm() to convert data
for comparison
strxfrm'd data can be bytewise compared -- this also limits the number of
mutibyte calls
used in comparisons because each key may be subject to more than one comparison
use this script to see what is happening
I eliminated the ascii codes because of display and/or printf conflicts with
some single byte chars
(like % and isspace chars)
x3 will have just the unicode hex codes in the locale-specific collation order
on my linux box for the locale below x3 (multibyte local) order != x1
(bytewise) order
--
export LC_ALL=en_US.UTF-8
typeset -li16 i
for (( i=0x0100 ; i < 0x3000 ; i++ ))
do
printf "\u[${i#16#}]\t- %04x\n" $i
done > x1
sort < x1 > x2
sed 's/.*- //' < x2 > x3
--
On Thu, 14 Mar 2013 11:39:07 +0100 Cedric Blancher wrote:
> Does AST sort on Linux support sorting of multibyte characters on Linux?
> I tried to sort multibyte characters like in the test script below:
> ----------cut----------
> % cat genunicodelist.sh
> typeset -li16 i
> rm x1 x2
> for (( i=1 ; i < 0x3000 ; i++ )) ; do
> printf "\u[${i/~(El)16#/}]\t- %s\n" "$i"
> done >x1
> ~/bin/sort <x1 >x2
> ----------cut----------
> if I look at the contents of file x2 I see that the sorting appears to
> be based on the byte values:
> ----------cut----------
> á½ - 16#10fd
> á¾ - 16#10fe
> á¿ - 16#10ff
> - 16#11
> á - 16#1100
> á - 16#1101
> á - 16#1102
> á - 16#1103
> á - 16#1104
> á
- 16#1105
> á - 16#1106
> á - 16#1107
> ----------cut----------
> Is this a bug in AST sort or Linux? We are using opensuse 12.2 on a
> lenovo thinkpad and XENON servers.
> AST sort version:
> sort --version
> version sort (AT&T Research) 2010-08-11
> Ced
> --
> Cedric Blancher <[email protected]>
> Institute Pasteur
> _______________________________________________
> ast-developers mailing list
> [email protected]
> http://lists.research.att.com/mailman/listinfo/ast-developers
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers