Which charset is your terminal configured to use on each operating system? On 1. 8. 2012., at 10:50, "Sebastian Reitenbach" <[email protected]> wrote:
> > On Wednesday, August 1, 2012 05:16 CEST, Eric Wasylishen > <[email protected]> wrote: > >> Hi, >> >> A while ago I added code to NSString.m to use ICU for the -compare: and >> -rangeOfString: methods, so they're done correctly with respect to unicode >> and locales, as well as tests that verify the behaviour matches Cocoa for >> the most part. >> >> The -lowercaseString/-uppercaseString methods should probably use >> u_strFoldCase if ICU is available. >> >> I'm skimming through the NSString API looking for methods that we should use >> ICU for and currently don't (or don't implement), and there are only a >> handful: >> >> -decomposedString* and -precomposedString* methods >> -uppercase/lowercase/capitalized methods >> -stringByFoldingWithOptions:locale: >> -localizedStandardCompare: >> -rangeOfComposedCharacterSequenceAtIndex: >> -rangeOfComposedCharacterSequencesForRange: >> -initWithFormat:locale: and friends perhaps? Maybe what we have now is fine >> though, I'm not too familiar with it. >> >> I'd be willing to do the case folding ones at some point, for a start. :-) > > I "enhanced" my test program a bit, and compared output when running on Linux > and OpenBSD: > > #import <Foundation/Foundation.h> > > > int main(int argc, char *argv[]) { > NSLog(@"Lowercase: %@", [[NSString stringWithString:@"TöÖst"] > lowercaseString]); > > } > > running the test program on a Linux box in xterm (opensuse 11.3) without my > patch: > sre@sre:~> LC_CTYPE='de_DE.UTF-8' ./lowercase > 2012-08-01 08:49:57.972 lowercase[16574] autorelease called without pool for > object (0x72db28) of class GSCInlineString in thread <NSThread: 0x6b0cc8> > 2012-08-01 08:49:57.974 lowercase[16574] autorelease called without pool for > object (0x72dce8) of class GSCInlineString in thread <NSThread: 0x6b0cc8> > 2012-08-01 08:49:57.974 lowercase[16574] Lowercase: töÃst > sre@sre:~> LC_CTYPE='en_EN.UTF-8' ./lowercase > 2012-08-01 08:50:09.500 lowercase[16584] autorelease called without pool for > object (0x72d538) of class GSCInlineString in thread <NSThread: 0x6b06d8> > 2012-08-01 08:50:09.501 lowercase[16584] autorelease called without pool for > object (0x72d6f8) of class GSCInlineString in thread <NSThread: 0x6b06d8> > 2012-08-01 08:50:09.501 lowercase[16584] Lowercase: töÖst > > logged in from the same Linux box, xterm, to the OpenBSD host I get (with and > without my patch): > $ LC_CTYPE='de_DE.UTF-8' ./lowercase > 2012-08-01 10:38:52.850 lowercase[5483] autorelease called without pool for > object (0x20c403f88) of class GSUnicodeInlineString in thread <NSThread: > 0x20750be08> > 2012-08-01 10:38:52.851 lowercase[5483] autorelease called without pool for > object (0x209c1c5c8) of class GSUnicodeInlineString in thread <NSThread: > 0x20750be08> > 2012-08-01 10:38:52.852 lowercase[5483] Lowercase: tööst > $ LC_CTYPE='en_EN.UTF-8' ./lowercase > 2012-08-01 10:38:46.754 lowercase[32569] autorelease called without pool for > object (0x20af26088) of class GSUnicodeInlineString in thread <NSThread: > 0x2028f9308> > 2012-08-01 10:38:46.756 lowercase[32569] autorelease called without pool for > object (0x20444f248) of class GSUnicodeInlineString in thread <NSThread: > 0x2028f9308> > 2012-08-01 10:38:46.756 lowercase[32569] Lowercase: t��st > > The weird thing on Linux is that the second Ö is not lowercase, but on > OpenBSD it is. Also on Linux its linked against icu4c. > Even weirder is that the LC_CTYPE, with DE it works on OpenBSD, but not > Linux, and with EN the other way around? > > Sebastian > > >> >> Eric >> >> On Jul 31, 2012, at 3:40 PM, Stefan Bidi <[email protected]> wrote: >> >>> On Tue, Jul 31, 2012 at 12:27 PM, Sebastian Reitenbach >>> <[email protected]> wrote: >>>> >>>> On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <[email protected]> >>>> wrote: >>>> >>>>> Are you using GNUstep with or without ICU? When you say skipped, is it >>>>> removed from the destination, or just passed through unmodified? Is your >>>>> locale set to something that recognises letters with umlauts? >>>> >>>> It's with ICU, and I run OGo with >>>> LC_CTYPE='de_DE.UTF-8' >>>> so, supposed to recognize Umlauts. >>>> >>>> I had some NSLog in GSString lowercase, and without my patch, it returns 0 >>>> for an Umlaut, so its not really skipped, but the >>>> o->_contents.c[i] is set to 0 in the middle of a string :( >>>> >>>> My patch just checks if tolower returned 0, and then just pass the >>>> character it cannot handle without doing anything with it. >>>> >>>> following ICU is installed: >>>> $ pkg_info | grep icu4c >>>> icu4c-4.8.1.1 International Components for Unicode >>> >>> Just FYI, GNUstep doesn't use ICU in NSString (David add a GSICUString >>> class, but it isn't used very often). I looked into it over a year >>> ago but decided against implementing something. The reason was >>> because I didn't completely understand the code and at that point I >>> had already started working on CFString, which I could freely break >>> without anyone noticing. >>> >>> Stef >>> >>>> >>>> gnustep is from the latest releases, using libobjc from gcc 4.2.1, if that >>>> matters. >>>> >>>> Sebastian >>>> >>>> >>>>> >>>>> David >>>>> >>>>> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> with OGo, I convert a UTF-8 string to lowercase, using [NSStrings >>>>>> lowercaseString] >>>>>> >>>>>> when there are Umlauts in the string, then GNUstep just omits the >>>>>> character. >>>>>> I've no idea, whether this is right or wrong actually. >>>>>> >>>>>> With the attached patch below to GSString it does not omit the character >>>>>> anymore. >>>>>> >>>>>> >>>>>> gcc -fgnu-runtime -fconstant-string-class=NSConstantString >>>>>> -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o >>>>>> lowercase >>>>>> >>>>>> cat lowercase.m >>>>>> #import <Foundation/Foundation.h> >>>>>> >>>>>> >>>>>> int main(int argc, char *argv[]) { >>>>>> NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"] >>>>>> lowercaseString]); >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> Does above running the program on a Mac output the ö or omit it from the >>>>>> string? >>>>>> >>>>>> does it change when running with LC_CTYPE="C" or LC_CTYPE='de_DE.UTF-8' ? >>>>>> >>>>>> I don't have a Mac, so cannot test myself, maybe also the approach used >>>>>> by OGo could be wrong. >>>>>> At least when reading the Apple docs, then there is nothing said about >>>>>> skipped characters, >>>>>> only that i.e. a ß may change to SS when i.e. using uppercaseString. >>>>>> Since they mentioned the ß in the documentation, I'd expect the >>>>>> lowercaseString to handle other Umlauts too, or is that just plain wrong >>>>>> assumption? >>>>>> >>>>>> if someone could hit me with a cluestick please ;) >>>>>> >>>>>> cheers, >>>>>> Sebastian >>>>>> >>>>>> the patch to not omit Umlauts. >>>>>> $OpenBSD$ >>>>>> --- Source/GSString.m.orig Tue Jul 31 18:31:36 2012 >>>>>> +++ Source/GSString.m Tue Jul 31 18:32:24 2012 >>>>>> @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise. >>>>>> while (i-- > 0) >>>>>> { >>>>>> o->_contents.c[i] = tolower(_contents.c[i]); >>>>>> + if (o->_contents.c[i] == 0) >>>>>> + o->_contents.c[i] = _contents.c[i]; >>>>>> } >>>>>> o->_flags.wide = 0; >>>>>> o->_flags.owned = 1; // Ignored on dealloc, but means we own buffer >>>>>> >>>>>> _______________________________________________ >>>>>> Discuss-gnustep mailing list >>>>>> [email protected] >>>>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep >>>>> >>>>> -- >>>>> This email complies with ISO 3103 >>>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Discuss-gnustep mailing list >>>> [email protected] >>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep >>> >>> _______________________________________________ >>> Discuss-gnustep mailing list >>> [email protected] >>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep >> > > > > > > _______________________________________________ > Discuss-gnustep mailing list > [email protected] > https://lists.gnu.org/mailman/listinfo/discuss-gnustep _______________________________________________ Discuss-gnustep mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnustep
