On Tue, 02 Oct 2012 13:15:07 +0200 Michal Hlavinka wrote:
> On 10/01/2012 07:56 PM, David Korn wrote:
> > cc: ast-developers@research.att.com
> > Subject: Re: [ast-developers] string <> comparison in [[ ]]
> > --------
> >
> >> one our user complains about string comparison result:
> >>
> >> var1="4000"
> >> var2=" 500"
> >> [[ $var2 > $var1 ]]; echo $?
> >> 0
> >> [[ "$var2" > "$var1" ]]; echo $?
> >> 0
> >>
> >>
> >> Based on ksh man page:
> >> string1 < string2
> >>     True, if string1 comes before string2 based on ASCII value of their
> >> characters.
> >>
> >> Space is ASCII 32, '4' is ASCII 52, thus space is less than 4. So, as
> >> per ASCII value comparison, "4000" should have been greater than " 500",
> >> whereas result is shows " 500" is greater.
> >>
> >> Odd is that old bash gave "correct" result, but the new one gives the
> >> "wrong" one - the same as ksh, so I wonder if I miss something.
> >>
> >> Michal
> >>
> >
> > I get 1 and 1 when I run this.
> >
> > $ uname -a
> > Linux terra.research.att.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Tue Jul 10 
> > 13:47:21 UTC 2012 x86_64 x86_64 i386-64 GNU/Linux
> > $ print $LC_ALL
> > C

> Interesting. Is there some reason why locale affects it?

> It seems to depend on encoding. For my Czech locale:
> $ export LC_ALL=cs_CZ.utf-8
> $ [[ "$var2" > "$var1" ]]; echo $?
> 0
> $ export LC_ALL=cs_CZ.iso-8859-2
> $ [[ "$var2" > "$var1" ]]; echo $?
> 0
> $ export LC_ALL=cs_CZ.iso-8859-1
> $ [[ "$var2" > "$var1" ]]; echo $?
> 1

> so the result is incorrect for "usual" Czech encodings (iso-8859-2 and 
> utf-8 have all Czech letters, iso-8859-1 does not)

> for English locale:
> $ export LC_ALL=en_US.utf-8
> $ [[ "$var2" > "$var1" ]]; echo $?
> 0
> $ export LC_ALL=en_US.iso-8859-1
> $ [[ "$var2" > "$var1" ]]; echo $?
> 0
> $ export LC_ALL=en_US.iso-8859-2
> $ [[ "$var2" > "$var1" ]]; echo $?
> 1

> the result is again incorrect for usual encodings.

> Could someone explain me what is happening here?
> My knowledge about locales seems to be insufficient.

use this C program to see how locale affects strcoll()
--
#include <locale.h>
#include <stdio.h>
#include <string.h>

int
main(int argc, char** argv)
{
        char*           a;
        char*           b;

        setlocale(LC_ALL, "");
        while ((a = *++argv) && (b = *++argv))
                printf("%d  '%s'  '%s'\n", strcoll(a, b), a, b);
        return 0;
}
--

_______________________________________________
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to