On Wednesday, August 1, 2012 11:49 CEST, Ivan Vučica <[email protected]> wrote: > Which charset is your terminal configured to use on each operating system?
sorry, don't know how to figure that out? Sebastian > > On 1. 8. 2012., at 10:50, "Sebastian Reitenbach" > <[email protected]> wrote: > > > > > On Wednesday, August 1, 2012 05:16 CEST, Eric Wasylishen > > <[email protected]> wrote: > > > >> Hi, > >> > >> A while ago I added code to NSString.m to use ICU for the -compare: and > >> -rangeOfString: methods, so they're done correctly with respect to unicode > >> and locales, as well as tests that verify the behaviour matches Cocoa for > >> the most part. > >> > >> The -lowercaseString/-uppercaseString methods should probably use > >> u_strFoldCase if ICU is available. > >> > >> I'm skimming through the NSString API looking for methods that we should > >> use ICU for and currently don't (or don't implement), and there are only a > >> handful: > >> > >> -decomposedString* and -precomposedString* methods > >> -uppercase/lowercase/capitalized methods > >> -stringByFoldingWithOptions:locale: > >> -localizedStandardCompare: > >> -rangeOfComposedCharacterSequenceAtIndex: > >> -rangeOfComposedCharacterSequencesForRange: > >> -initWithFormat:locale: and friends perhaps? Maybe what we have now is > >> fine though, I'm not too familiar with it. > >> > >> I'd be willing to do the case folding ones at some point, for a start. :-) > > > > I "enhanced" my test program a bit, and compared output when running on > > Linux and OpenBSD: > > > > #import <Foundation/Foundation.h> > > > > > > int main(int argc, char *argv[]) { > > NSLog(@"Lowercase: %@", [[NSString stringWithString:@"TöÖst"] > > lowercaseString]); > > > > } > > > > running the test program on a Linux box in xterm (opensuse 11.3) without my > > patch: > > sre@sre:~> LC_CTYPE='de_DE.UTF-8' ./lowercase > > 2012-08-01 08:49:57.972 lowercase[16574] autorelease called without pool > > for object (0x72db28) of class GSCInlineString in thread <NSThread: > > 0x6b0cc8> > > 2012-08-01 08:49:57.974 lowercase[16574] autorelease called without pool > > for object (0x72dce8) of class GSCInlineString in thread <NSThread: > > 0x6b0cc8> > > 2012-08-01 08:49:57.974 lowercase[16574] Lowercase: töÃst > > sre@sre:~> LC_CTYPE='en_EN.UTF-8' ./lowercase > > 2012-08-01 08:50:09.500 lowercase[16584] autorelease called without pool > > for object (0x72d538) of class GSCInlineString in thread <NSThread: > > 0x6b06d8> > > 2012-08-01 08:50:09.501 lowercase[16584] autorelease called without pool > > for object (0x72d6f8) of class GSCInlineString in thread <NSThread: > > 0x6b06d8> > > 2012-08-01 08:50:09.501 lowercase[16584] Lowercase: töÖst > > > > logged in from the same Linux box, xterm, to the OpenBSD host I get (with > > and without my patch): > > $ LC_CTYPE='de_DE.UTF-8' ./lowercase > > 2012-08-01 10:38:52.850 lowercase[5483] autorelease called without pool for > > object (0x20c403f88) of class GSUnicodeInlineString in thread <NSThread: > > 0x20750be08> > > 2012-08-01 10:38:52.851 lowercase[5483] autorelease called without pool for > > object (0x209c1c5c8) of class GSUnicodeInlineString in thread <NSThread: > > 0x20750be08> > > 2012-08-01 10:38:52.852 lowercase[5483] Lowercase: tööst > > $ LC_CTYPE='en_EN.UTF-8' ./lowercase > > 2012-08-01 10:38:46.754 lowercase[32569] autorelease called without pool > > for object (0x20af26088) of class GSUnicodeInlineString in thread > > <NSThread: 0x2028f9308> > > 2012-08-01 10:38:46.756 lowercase[32569] autorelease called without pool > > for object (0x20444f248) of class GSUnicodeInlineString in thread > > <NSThread: 0x2028f9308> > > 2012-08-01 10:38:46.756 lowercase[32569] Lowercase: t��st > > > > The weird thing on Linux is that the second Ö is not lowercase, but on > > OpenBSD it is. Also on Linux its linked against icu4c. > > Even weirder is that the LC_CTYPE, with DE it works on OpenBSD, but not > > Linux, and with EN the other way around? > > > > Sebastian > > > > > >> > >> Eric > >> > >> On Jul 31, 2012, at 3:40 PM, Stefan Bidi <[email protected]> wrote: > >> > >>> On Tue, Jul 31, 2012 at 12:27 PM, Sebastian Reitenbach > >>> <[email protected]> wrote: > >>>> > >>>> On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <[email protected]> > >>>> wrote: > >>>> > >>>>> Are you using GNUstep with or without ICU? When you say skipped, is it > >>>>> removed from the destination, or just passed through unmodified? Is > >>>>> your locale set to something that recognises letters with umlauts? > >>>> > >>>> It's with ICU, and I run OGo with > >>>> LC_CTYPE='de_DE.UTF-8' > >>>> so, supposed to recognize Umlauts. > >>>> > >>>> I had some NSLog in GSString lowercase, and without my patch, it returns > >>>> 0 for an Umlaut, so its not really skipped, but the > >>>> o->_contents.c[i] is set to 0 in the middle of a string :( > >>>> > >>>> My patch just checks if tolower returned 0, and then just pass the > >>>> character it cannot handle without doing anything with it. > >>>> > >>>> following ICU is installed: > >>>> $ pkg_info | grep icu4c > >>>> icu4c-4.8.1.1 International Components for Unicode > >>> > >>> Just FYI, GNUstep doesn't use ICU in NSString (David add a GSICUString > >>> class, but it isn't used very often). I looked into it over a year > >>> ago but decided against implementing something. The reason was > >>> because I didn't completely understand the code and at that point I > >>> had already started working on CFString, which I could freely break > >>> without anyone noticing. > >>> > >>> Stef > >>> > >>>> > >>>> gnustep is from the latest releases, using libobjc from gcc 4.2.1, if > >>>> that matters. > >>>> > >>>> Sebastian > >>>> > >>>> > >>>>> > >>>>> David > >>>>> > >>>>> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> with OGo, I convert a UTF-8 string to lowercase, using [NSStrings > >>>>>> lowercaseString] > >>>>>> > >>>>>> when there are Umlauts in the string, then GNUstep just omits the > >>>>>> character. > >>>>>> I've no idea, whether this is right or wrong actually. > >>>>>> > >>>>>> With the attached patch below to GSString it does not omit the > >>>>>> character anymore. > >>>>>> > >>>>>> > >>>>>> gcc -fgnu-runtime -fconstant-string-class=NSConstantString > >>>>>> -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o > >>>>>> lowercase > >>>>>> > >>>>>> cat lowercase.m > >>>>>> #import <Foundation/Foundation.h> > >>>>>> > >>>>>> > >>>>>> int main(int argc, char *argv[]) { > >>>>>> NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"] > >>>>>> lowercaseString]); > >>>>>> > >>>>>> } > >>>>>> > >>>>>> > >>>>>> > >>>>>> Does above running the program on a Mac output the ö or omit it from > >>>>>> the string? > >>>>>> > >>>>>> does it change when running with LC_CTYPE="C" or > >>>>>> LC_CTYPE='de_DE.UTF-8' ? > >>>>>> > >>>>>> I don't have a Mac, so cannot test myself, maybe also the approach > >>>>>> used by OGo could be wrong. > >>>>>> At least when reading the Apple docs, then there is nothing said about > >>>>>> skipped characters, > >>>>>> only that i.e. a ß may change to SS when i.e. using uppercaseString. > >>>>>> Since they mentioned the ß in the documentation, I'd expect the > >>>>>> lowercaseString to handle other Umlauts too, or is that just plain > >>>>>> wrong assumption? > >>>>>> > >>>>>> if someone could hit me with a cluestick please ;) > >>>>>> > >>>>>> cheers, > >>>>>> Sebastian > >>>>>> > >>>>>> the patch to not omit Umlauts. > >>>>>> $OpenBSD$ > >>>>>> --- Source/GSString.m.orig Tue Jul 31 18:31:36 2012 > >>>>>> +++ Source/GSString.m Tue Jul 31 18:32:24 2012 > >>>>>> @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise. > >>>>>> while (i-- > 0) > >>>>>> { > >>>>>> o->_contents.c[i] = tolower(_contents.c[i]); > >>>>>> + if (o->_contents.c[i] == 0) > >>>>>> + o->_contents.c[i] = _contents.c[i]; > >>>>>> } > >>>>>> o->_flags.wide = 0; > >>>>>> o->_flags.owned = 1; // Ignored on dealloc, but means we own > >>>>>> buffer > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Discuss-gnustep mailing list > >>>>>> [email protected] > >>>>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep > >>>>> > >>>>> -- > >>>>> This email complies with ISO 3103 > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Discuss-gnustep mailing list > >>>> [email protected] > >>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep > >>> > >>> _______________________________________________ > >>> Discuss-gnustep mailing list > >>> [email protected] > >>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep > >> > > > > > > > > > > > > _______________________________________________ > > Discuss-gnustep mailing list > > [email protected] > > https://lists.gnu.org/mailman/listinfo/discuss-gnustep > _______________________________________________ Discuss-gnustep mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnustep
