Micah Cowan <[EMAIL PROTECTED]> wrote: > Jim Meyering wrote: >> Here's a tentative patch that also avoids repeated >> (and wasteful) initialization of the xlate array. > > I note that POSIX requires that, in the case that the arguments are > exactly '[:lower:]' and '[:upper:]' (or the reverse of the same), tr is > actually supposed to ignore the 'lower' and 'upper' character classes, > and instead initialize the mapping from the locale's "tolower"/"toupper" > definition. This would have avoided the length mismatch in the first > place, and while that issue appears to be addressed, tr still does not > conform to POSIX, as, if tr were to encounter a locale definition file > with an LC_CTYPE category definition such as the following:
Thanks for the feedback. However, you seem to be misinterpreting something. GNU tr has always initialized its internal translation array using the tolower and toupper functions. The problem I mentioned above is that it was performing the correct initialization repeatedly. > upper A;...;Z > lower a;...;z > tolower (A,Z) > ... > This would require > $ echo AAAA | tr '[:upper:]' '[:lower:]' > to output "ZZZZ" (though it isn't even lowercased), rather than 'aaaa'. GNU tr should work properly, even with such an odd locale -- as long as it's a uni-byte one. See below. > While the example above is, of course, contrived, there may well be > locales where the tolower/toupper mappings differ from the longest > possible mapping between the 'upper' and 'lower' classes. > > In fact, as it currently stands, I expect tr mishandles a case such as: > $ echo σιγμας | tr '[:lower:]' '[:upper:]' > (Note the two variants of "sigma" in there, which both have a single > corresponding capital letter; I'm afraid I can't actually verify this is > broken, as my work desktop is not set up to compile coreutils, and I > lack the time to correct this for now; the stock (old) tr on the system, > running Fedora Core 6, silently passes it through without conversion.) Your example uses multi-byte characters, and that is a separated issue. Upstream GNU tr does not yet work with multi-byte characters. If you can make tr misbehave, it'd be great to hear about it soon, since I'm pretty close to being able to release a stable coreutils-6.10. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils