On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann <gerr...@mdenkmann.de> wrote:
> 
> Where is my bicycle gone? What am I doing wrong?

Before this thread heads further into outer space…

I suspect it [NSCharacterSet] is just broken. Look here, for example:

        
http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
 
<http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this>

The problem is that it’s unclear whether the “characters” in NSCharacterSet are 
internally UTF-16 code units, UTF-32 code units, Unicode code points, or 
something else. According to the NSCharacterSet documentation:

> "An NSCharacterSet object represents a set of Unicode-compliant characters.”


and:

> "The NSCharacterSet class declares the programmatic interface for an object 
> that manages a set of Unicode characters (see the NSString class cluster 
> specification for information on Unicode).”


According the NSString documentation:

> "A string object presents itself as an array of Unicode characters (Unicode 
> is a registered trademark of Unicode, Inc.). You can determine how many 
> characters a string object contains with the length method and can retrieve a 
> specific character with the characterAtIndex: method.”


Working backwards, we know that the characters that are counted by -[NSString 
length]’ are UTF-16 code units, so this all *possibly* implies that 
NSCharacterSet characters are UTF-16 code units, too. Plus, back in 
NSCharacterSet documentation:

> "NSCharacterSet’s principal primitive method, characterIsMember:, provides 
> the basis for all other instance methods in its interface.”


If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.

Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the 
API was enhanced in 10.2 (see: 
http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html
 
<http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html>,
 for some tantalizing hints about NSCharacterSet), the implementation was a 
hack that works somehow but isn’t documented. I don’t think you’re going to get 
any definitive answer except directly from Apple.

A suggestion, though:

Try building your character set using ‘characterSetWithRange:’ and/or the 
NSMutableCharacterSet methods that add ranges, instead of using NSStrings. 
Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility 
reasons — when using NSStrings explicitly.




_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to