I am hoping that sending this along is OK - not sure how to proceed.
Please let me know if I should append this to the original bug report, open a 
new bug, or otherwise.

Further research shows that bash is behaving very badly with bracket matching.
HOWEVER, the problem is apparently with strcoll().

I built bash with some debug printfs -
 one prior to entering the second rangecmp test in lib/glob/sm_loop.c (~467):
-----------------------------------------------------------------------
printf("enter rangecmp x 2; cstart %x, cend %x, test %x\n", cstart, cend, test);
      if (RANGECMP (test, cstart) >= 0 && RANGECMP (test, cend) <= 0) {
        goto matched;
      }
-----------------------------------------------------------------------
 another prior to strcoll() in lib/glob/smatch.c[rangecmp()](~72):
-----------------------------------------------------------------------
  s1[0] = a1;
  s2[0] = a2;
printf ("++> strcoll(a, b) a = %s (0x%x), b = %s (0x%x), result = %x\n", s1, 
s1[0], s2, s2[0], strcoll (s1, s2));
  if ((ret = strcoll (s1, s2)) != 0) {
    return ret;
  }
-----------------------------------------------------------------------
Running the resultant shell with LC_COLLATE=en_US.UTF-8 in the source root for 
bash -
and executing "ls [HIJKLMNO]*" from within that shell, a portion of output:
-----------------------------------------------------------------------
enter rangecmp x 2; cstart 48, cend 48, test 6c
++> strcoll(a, b) a = l (0x6c), b = H (0x48), result = 4
++> strcoll(a, b) a = l (0x6c), b = H (0x48), result = 4
enter rangecmp x 2; cstart 49, cend 49, test 6c
++> strcoll(a, b) a = l (0x6c), b = I (0x49), result = 3
++> strcoll(a, b) a = l (0x6c), b = I (0x49), result = 3
enter rangecmp x 2; cstart 4a, cend 4a, test 6c
++> strcoll(a, b) a = l (0x6c), b = J (0x4a), result = 2
++> strcoll(a, b) a = l (0x6c), b = J (0x4a), result = 2
enter rangecmp x 2; cstart 4b, cend 4b, test 6c
++> strcoll(a, b) a = l (0x6c), b = K (0x4b), result = 1
++> strcoll(a, b) a = l (0x6c), b = K (0x4b), result = 1
enter rangecmp x 2; cstart 4c, cend 4c, test 6c
++> strcoll(a, b) a = l (0x6c), b = L (0x4c), result = fffffff9
enter rangecmp x 2; cstart 4d, cend 4d, test 6c
++> strcoll(a, b) a = l (0x6c), b = M (0x4d), result = ffffffff
enter rangecmp x 2; cstart 4e, cend 4e, test 6c
++> strcoll(a, b) a = l (0x6c), b = N (0x4e), result = fffffffe
enter rangecmp x 2; cstart 4f, cend 4f, test 6c
++> strcoll(a, b) a = l (0x6c), b = O (0x4f), result = fffffffd
-----------------------------------------------------------------------

Note that the result from strcoll for the test of "l" against "L", which I 
would have thought should
return "0" (and, by the way, the sequence of return values in this output seems 
to bear that out logically),
actually returns "fffffff9".  That value is returned consistently with 
lowercase compared against upper case equiv.
The ONLY time I see "0" returned from strcoll() is when the char is an EXACT 
match (with case).

It's been a while since I messed with this stuff - if I can remember how to 
build a debug version of libc and
run some tests with it, I will.  Maybe I am doing something wrong, but the bash 
response is clearly bad.
Some quick examples of total inconsistencies (using out of the box bash - and 
locale is inconsequential unless C):
-----------------------------------------------------------------------
root@debmicro:/usr/local/src/bash-4.2# ls [K-M]*
list.c      lsignames.h  make_cmd.c  Makefile.in     mksignames.o
list.o      mailcheck.c  make_cmd.h  MANIFEST     mksyntax
locale.c  mailcheck.h  make_cmd.o  MANIFEST.doc  mksyntax.c
locale.o  mailcheck.o  Makefile    mksignames

lib:
glob  intl  malloc  readline  sh  termcap  tilde
root@debmicro:/usr/local/src/bash-4.2#
----------------------------------------------------------------------

root@debmicro:/usr/local/src/bash-4.2# ls [k-m]*
list.c      locale.o     mailcheck.h  make_cmd.h    mksignames.o
list.o      lsignames.h  mailcheck.o  make_cmd.o    mksyntax
locale.c  mailcheck.c  make_cmd.c   mksignames    mksyntax.c

lib:
glob  intl  malloc  readline  sh  termcap  tilde

root@debmicro:/usr/local/src/bash-4.2#
----------------------------------------------------------------------
root@debmicro:/usr/local/src/bash-4.2# ls [KLM]*
Makefile  Makefile.in  MANIFEST  MANIFEST.doc

root@debmicro:/usr/local/src/bash-4.2#
----------------------------------------------------------------------
root@debmicro:/usr/local/src/bash-4.2# ls [klm]*
list.c      locale.o     mailcheck.h  make_cmd.h    mksignames.o
list.o      lsignames.h  mailcheck.o  make_cmd.o    mksyntax
locale.c  mailcheck.c  make_cmd.c   mksignames    mksyntax.c

lib:
glob  intl  malloc  readline  sh  termcap  tilde
root@debmicro:/usr/local/src/bash-4.2#

-----------------------------------------------------------------------
Clearly, we should get the same results for [K-M]* and [KLM]*, but they differ 
wildly.

If the system should be ignoring cases, the differences between [K-M]* and 
[k-m]* are problematic.
It seems that if lower case is used, the matches come out to be as one might 
expect (*without* ignoring case).
This actually has to do with the way strcoll() is behaving and the way the 
rangecmp call is set up in sm_loop.c.

I am mainly concerned with this since I don't know how severe the effects of 
this strcoll() behavior could be.

Apologies if being a nuisance - just trying to help out a bit.

Thanks and regards -
Bruce.



________________________________
 From: Jonathan Nieder <[email protected]>
To: Bruce Gayliard <[email protected]> 
Cc: [email protected] 
Sent: Tuesday, June 18, 2013 12:38 AM
Subject: Re: Looks like the underlying issue is the default locale
 

reassign 712592 libc6
forcemerge 333953 712592
quit

Hi Bruce,

Bruce Gayliard wrote:

> After doing a little research on this I found that strcoll(),
> called at the end of rangecmp(), was treating lower and
> upper cases equally.
> It appears that the default locale, en_US.UTF-8, is the real
> culprit.

Thanks for investigating.  Merging with a related report.

Hope that helps,
Jonathan

Reply via email to