On Fri, 31 Dec 2004 22:16:39 -0500 (EST)
Henry Spencer <[EMAIL PROTECTED]> wrote:

> On Fri, 31 Dec 2004, Michael B Allen wrote:
> > > mbtowc/towupper approach isn't really sufficient -- for example, a
> > > case change can alter the length of the string.
> > 
> > Dear god please tell me your mistaken. Please provide an example?
> 
> The classic example is that U+00DF, the German eszett, is a lowercase
> letter whose uppercase equivalent is the two-letter group "SS".

Actually this isn't as bad as I thought. My main concern was that my
caseless comparison of filesystem paths would be affected. But it just
calls mbrtowc and towupper to compare. So this will actually work because
the comparison is with wide characters.

I'm also upcasing usernames to do a caseless comparison but I think the
worst case scenario based on the code just posted is that the comparison
afterward will simply fail. That is unless the now corrupted string doesn't
somehow make it's way into some database and muck things up. Instead of:

    n = mbrtowc(&wc, str, slim - str, &psw);
    if ((wcu = towupper(wc)) != wc) {
        if (wcrtomb(str, wcu, &psm) == (size_t)-1) {
            return -1;
        }
    }
    str += n; /* oops! truncated char or not long enough */

it should just check to make sure wcrtomb returns the same value as
mbrtowc:

    n = mbrtowc(&wc, str, slim - str, &psw);
    if ((wcu = towupper(wc)) != wc) {
        if (wcrtomb(str, wcu, &psm) != n) {
            return -1; /* didn't convert back to same size as lowercase! */
        }
    }
    str += n;

> Another example is that some precomposed combinations of letter and accent
> (e.g. U+0149, apostrophe-n) exist in only one case and must be mapped to a
> longer sequence when case changes. 
> 
> There might also -- I'm not sure -- be some titlecase letter combinations
<snip>
> The mbtowc/towupper scheme also fails in situations where case mapping is
> context-dependent
<snip>

Are these combinations common in usernames or pathnames?

> > > ...more context:  why do you want to do this, as part of what? 
> > 
> > I just want to upcase or downcase a string.

So far I just want to upcase a username and do caseless comparison
of pathnames.

Thanks for your response. I'm glad I asked. This list is always very
helpful.

Mike

-- 
Greedo shoots first? Not in my Star Wars.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to