Pádraig Brady wrote: > On 25/09/10 07:53, Jim Meyering wrote: >> Eric Blake wrote: >>> On 09/24/2010 04:47 PM, Pádraig Brady wrote: >>>> I was just looking at a bug reported to fedora there where this abort()s >>>> >>>> $ LC_ALL=en_US tr '[:upper:] ' '[:lower:]' >> >> Ouch! Thanks for reporting it here. >> How many more bugs lurk in tr... >> Consolation: this one is a failure to diagnose invalid inputs. > > I found a few more issues:
Nice catches. Thanks for all the work. Please mention the above abort-inducing case in the log along with the other three. While you're at it, please use LC_ALL= there, not LANG=, since the latter is a glibc-ism. > This valid translation spec aborted: > LANG=en_US tr '[:upper:]- ' '[:lower:]_' The above does not abort w/glibc when LC_ALL=C happens to be set. > This misaligned conversion spec was allowed: > LANG=C tr 'A-Y[:lower:]' 'a-z[:upper:]' > This misaligned spec was allowed by extending the class: > LANG=C tr '[:upper:] ' '[:lower:]' > ... > + tr: fix various issues with case conversion classes. In some locales > + valid conversion specifications caused tr to abort, while some invalid > + specifications were undiagnosed. > + [bugs introduced in coreutils 6.9.90 and 6.9.92] Two separate bugs! Thanks for tracking them down. I'm surprised they are so recent. I suppose that means I introduced both. Let's see... These two seem like the only candidates: 6efd10462d8103208f4575f0b5edddf841c7d87c af5d0c363a52e787a4311a11f035209eecdc4115 Whichever they are, please list them in the commit log. > ** New features > > cp now accepts the --attributes-only option to not copy file data, > diff --git a/src/tr.c b/src/tr.c > index a5b6810..3adc85f 100644 > --- a/src/tr.c > +++ b/src/tr.c > @@ -1177,6 +1177,77 @@ card_of_complement (struct Spec_list *s) > return cardinality; > } > > +/* Discard the lengths associated with a case conversion, > + as using the actual number of upper or lower case characters > + is problematic when they don't match in some locales. > + Also ensure the case conversion classes in string2 are > + aligned correctly with those in string1. > + Note POSIX says the behavior of `tr "[:upper:]" "[:upper:]"' > + is undefined. Therefore we allow it (unlike Solaris) > + and treat it as a no-op. */ > + > +static void > +validate_case_classes (struct Spec_list *s1, struct Spec_list *s2) > +{ > + size_t upper_chars = 0; > + size_t lower_chars = 0; Please name these n_upper and n_lower, so each is obviously a counter. Otherwise, seeing the name out of context might evoke an array. > + int i; Please make this "unsigned int", since it never needs negative values. > + int c1=0, c2=0; These days, I prefer one per line, and with spaces: int c1 = 0; int c2 = 0; Hmm.. I see you're following existing style that did "int c1, c2;", though in existing code, those didn't have initializers. And the "int i;" was essentially moved. And there are others. I blame the author ;-) (me, 18 years ago) > + count old_s1_len = s1->length; > + count old_s2_len = s2->length; > + struct List_element *s1_tail = s1->tail; > + struct List_element *s2_tail = s2->tail; > + bool s1_new_element = true; > + bool s2_new_element = true; ... > + > +# Before coreutils 8.6 the disparat number of upper and lower s/disparat/disparate/ > +# characters in some locales, triggered abort()s and invalid behavior > +if test "$(LC_ALL=en_US.ISO-8859-1 locale charmap 2>/dev/null)" = ISO-8859-1; > +then > + ( No big deal, but why run this in a subshell? It doesn't seem necessary here. > + export LC_ALL=en_US.ISO-8859-1 > + tr '[:upper:] ' '[:lower:]' < /dev/null 2>out && _fail=1 > + echo 'tr: when translating with string1 longer than string2, > +the latter string must not end with a character class' > exp > + compare out exp || _fail=1 > + > + # Ensure when there are a different number of elements > + # in each string, we validate the case mapping correctly > + tr 'ab[:lower:]' '0-1[:upper:]' < /dev/null || _fail=1 It might be nice to ensure that e.g., abc.xyz maps to ABC.XYZ. > + # Ensure we extend string2 appropriately > + tr '[:upper:]- ' '[:lower:]_' < /dev/null || _fail=1 Similarly, that ABC-XYZ maps to abc_xyz Thanks again! > + # Ensure the size of the case classes are accounted > + # for as a unit. > + echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | > + tr '[:upper:]A-B' '[:lower:]0' >out || _fail=1 > + echo '00cdefghijklmnopqrstuvwxyz' > exp > + compare out exp || _fail=1 ...