On Wednesday, August 1, 2012 11:49 CEST, David Chisnall <[email protected]> 
wrote: 
 
> On 1 Aug 2012, at 09:50, Sebastian Reitenbach wrote:
> 
> > I "enhanced" my test program a bit, and compared output when running on 
> > Linux and OpenBSD:
> > 
> > #import <Foundation/Foundation.h>
> > 
> > 
> > int main(int argc, char *argv[]) {
> > NSLog(@"Lowercase: %@", [[NSString stringWithString:@"TöÖst"] 
> > lowercaseString]);
> > 
> > }
> 
> On closer inspection, there is a bug here, but not where you think it is...
> 
> Try this test case:
> 
> $ cat tolower.m #import <Foundation/Foundation.h>
> #import <wctype.h>
> 
> 
> int main(int argc, char *argv[]) {
>       [NSAutoreleasePool new];
>       NSString *l = [@"TöÖst" lowercaseString];
>       NSLog(@"Lowercase: %@", l);
>       NSLog(@"Lowercase: %s", [l UTF8String]);
>       for (int i=0 ; i<[l length] ; i++)
>       {
>               int c = [l characterAtIndex: i];
>               NSLog(@"%c %d", c,c);
>       }
> }
> $ clang tolower.m  -lgnustep-base
> $ ./a.out
> 2012-07-31 19:23:44.810 a.out[69751] Lowercase: t??st
> 2012-07-31 19:23:44.813 a.out[69751] Lowercase: tööst
> 2012-07-31 19:23:44.813 a.out[69751] t 116
> 2012-07-31 19:23:44.813 a.out[69751] ? 246
> 2012-07-31 19:23:44.814 a.out[69751] ? 246
> 2012-07-31 19:23:44.814 a.out[69751] s 115
> 2012-07-31 19:23:44.814 a.out[69751] t 116

I had to change it slightly to compile with gcc, results are still different:

#import <wctype.h>
#import <Foundation/Foundation.h>


int main(int argc, char *argv[]) {
        [NSAutoreleasePool new];
        int i=0;
        NSString *l = [@"TöÖst" lowercaseString];
        NSLog(@"Lowercase: %@", l);
        NSLog(@"Lowercase: %s", [l UTF8String]);
        for (i ; i<[l length] ; i++)
        {
                int c = [l characterAtIndex: i];
                NSLog(@"%c %d", c,c);
        }
}

On Linux:
sre@sre:~> LC_CTYPE='en_EN.UTF-8' ./lowercase2 
2012-08-01 13:16:59.692 lowercase2[22437] Lowercase: töÖst
2012-08-01 13:16:59.694 lowercase2[22437] Lowercase: töÃst
2012-08-01 13:16:59.694 lowercase2[22437] t 116
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 182
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 150
2012-08-01 13:16:59.694 lowercase2[22437] s 115
2012-08-01 13:16:59.694 lowercase2[22437] t 116
sre@sre:~> LC_CTYPE='de_DE.UTF-8' ./lowercase2 
2012-08-01 13:17:12.791 lowercase2[22441] Lowercase: töÃst
2012-08-01 13:17:12.792 lowercase2[22441] Lowercase: töÃst
2012-08-01 13:17:12.792 lowercase2[22441] t 116
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441] ¶ 182
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441]  150
2012-08-01 13:17:12.792 lowercase2[22441] s 115
2012-08-01 13:17:12.792 lowercase2[22441] t 116
On OpenBSD:
$ LC_CTYPE='en_EN.UTF-8' ./lowercase2
2012-08-01 13:18:25.497 lowercase2[5619] Lowercase: t��st
2012-08-01 13:18:25.502 lowercase2[5619] Lowercase: tööst
2012-08-01 13:18:25.502 lowercase2[5619] t 116
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] s 115
2012-08-01 13:18:25.502 lowercase2[5619] t 116
$ LC_CTYPE='de_DE.UTF-8' ./lowercase2
2012-08-01 13:18:32.743 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] t 116
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] s 115
2012-08-01 13:18:32.745 lowercase2[16814] t 116


> 
> The error appears to be in converting the 16-bit unicode string that is the 
> result of lowercaseString for display.  Note the values that are being 
> returned in characterAtIndex: - these are the correct unicode values, but 
> attempting to display them  is failing because the terminal is expecting 
> UTF-8, not UCS16 (and 246 is not a valid 8-bit UTF-8 character).  It seems 
> that NSLog is just truncating the string, rather than translating it into the 
> string locale that the terminal expects.
> 
> David
> 
> -- Sent from my STANTEC-ZEBRA 
 
 
 

_______________________________________________
Discuss-gnustep mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep

Reply via email to