Hi, A while ago I added code to NSString.m to use ICU for the -compare: and -rangeOfString: methods, so they're done correctly with respect to unicode and locales, as well as tests that verify the behaviour matches Cocoa for the most part.
The -lowercaseString/-uppercaseString methods should probably use u_strFoldCase if ICU is available. I'm skimming through the NSString API looking for methods that we should use ICU for and currently don't (or don't implement), and there are only a handful: -decomposedString* and -precomposedString* methods -uppercase/lowercase/capitalized methods -stringByFoldingWithOptions:locale: -localizedStandardCompare: -rangeOfComposedCharacterSequenceAtIndex: -rangeOfComposedCharacterSequencesForRange: -initWithFormat:locale: and friends perhaps? Maybe what we have now is fine though, I'm not too familiar with it. I'd be willing to do the case folding ones at some point, for a start. :-) Eric On Jul 31, 2012, at 3:40 PM, Stefan Bidi <[email protected]> wrote: > On Tue, Jul 31, 2012 at 12:27 PM, Sebastian Reitenbach > <[email protected]> wrote: >> >> On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <[email protected]> >> wrote: >> >>> Are you using GNUstep with or without ICU? When you say skipped, is it >>> removed from the destination, or just passed through unmodified? Is your >>> locale set to something that recognises letters with umlauts? >> >> It's with ICU, and I run OGo with >> LC_CTYPE='de_DE.UTF-8' >> so, supposed to recognize Umlauts. >> >> I had some NSLog in GSString lowercase, and without my patch, it returns 0 >> for an Umlaut, so its not really skipped, but the >> o->_contents.c[i] is set to 0 in the middle of a string :( >> >> My patch just checks if tolower returned 0, and then just pass the character >> it cannot handle without doing anything with it. >> >> following ICU is installed: >> $ pkg_info | grep icu4c >> icu4c-4.8.1.1 International Components for Unicode > > Just FYI, GNUstep doesn't use ICU in NSString (David add a GSICUString > class, but it isn't used very often). I looked into it over a year > ago but decided against implementing something. The reason was > because I didn't completely understand the code and at that point I > had already started working on CFString, which I could freely break > without anyone noticing. > > Stef > >> >> gnustep is from the latest releases, using libobjc from gcc 4.2.1, if that >> matters. >> >> Sebastian >> >> >>> >>> David >>> >>> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote: >>> >>>> Hi, >>>> >>>> with OGo, I convert a UTF-8 string to lowercase, using [NSStrings >>>> lowercaseString] >>>> >>>> when there are Umlauts in the string, then GNUstep just omits the >>>> character. >>>> I've no idea, whether this is right or wrong actually. >>>> >>>> With the attached patch below to GSString it does not omit the character >>>> anymore. >>>> >>>> >>>> gcc -fgnu-runtime -fconstant-string-class=NSConstantString >>>> -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o >>>> lowercase >>>> >>>> cat lowercase.m >>>> #import <Foundation/Foundation.h> >>>> >>>> >>>> int main(int argc, char *argv[]) { >>>> NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"] >>>> lowercaseString]); >>>> >>>> } >>>> >>>> >>>> >>>> Does above running the program on a Mac output the ö or omit it from the >>>> string? >>>> >>>> does it change when running with LC_CTYPE="C" or LC_CTYPE='de_DE.UTF-8' ? >>>> >>>> I don't have a Mac, so cannot test myself, maybe also the approach used by >>>> OGo could be wrong. >>>> At least when reading the Apple docs, then there is nothing said about >>>> skipped characters, >>>> only that i.e. a ß may change to SS when i.e. using uppercaseString. >>>> Since they mentioned the ß in the documentation, I'd expect the >>>> lowercaseString to handle other Umlauts too, or is that just plain wrong >>>> assumption? >>>> >>>> if someone could hit me with a cluestick please ;) >>>> >>>> cheers, >>>> Sebastian >>>> >>>> the patch to not omit Umlauts. >>>> $OpenBSD$ >>>> --- Source/GSString.m.orig Tue Jul 31 18:31:36 2012 >>>> +++ Source/GSString.m Tue Jul 31 18:32:24 2012 >>>> @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise. >>>> while (i-- > 0) >>>> { >>>> o->_contents.c[i] = tolower(_contents.c[i]); >>>> + if (o->_contents.c[i] == 0) >>>> + o->_contents.c[i] = _contents.c[i]; >>>> } >>>> o->_flags.wide = 0; >>>> o->_flags.owned = 1; // Ignored on dealloc, but means we own buffer >>>> >>>> _______________________________________________ >>>> Discuss-gnustep mailing list >>>> [email protected] >>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep >>> >>> -- >>> This email complies with ISO 3103 >>> >> >> >> >> >> >> _______________________________________________ >> Discuss-gnustep mailing list >> [email protected] >> https://lists.gnu.org/mailman/listinfo/discuss-gnustep > > _______________________________________________ > Discuss-gnustep mailing list > [email protected] > https://lists.gnu.org/mailman/listinfo/discuss-gnustep _______________________________________________ Discuss-gnustep mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnustep
