On Apr 6, 2015, at 09:19 , Gerriet M. Denkmann <[email protected]> wrote:
>
> Where is my bicycle gone? What am I doing wrong?
Before this thread heads further into outer space…
I suspect it [NSCharacterSet] is just broken. Look here, for example:
http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this
<http://stackoverflow.com/questions/23000812/creating-nscharacterset-with-unicode-smp-entries-testing-membership-is-this>
The problem is that it’s unclear whether the “characters” in NSCharacterSet are
internally UTF-16 code units, UTF-32 code units, Unicode code points, or
something else. According to the NSCharacterSet documentation:
> "An NSCharacterSet object represents a set of Unicode-compliant characters.”
and:
> "The NSCharacterSet class declares the programmatic interface for an object
> that manages a set of Unicode characters (see the NSString class cluster
> specification for information on Unicode).”
According the NSString documentation:
> "A string object presents itself as an array of Unicode characters (Unicode
> is a registered trademark of Unicode, Inc.). You can determine how many
> characters a string object contains with the length method and can retrieve a
> specific character with the characterAtIndex: method.”
Working backwards, we know that the characters that are counted by -[NSString
length]’ are UTF-16 code units, so this all *possibly* implies that
NSCharacterSet characters are UTF-16 code units, too. Plus, back in
NSCharacterSet documentation:
> "NSCharacterSet’s principal primitive method, characterIsMember:, provides
> the basis for all other instance methods in its interface.”
If that’s true, ‘longCharacterIsMember:’ is pretty much screwed.
Perhaps the NSCharacterSet documentation is just wrong. Or perhaps, when the
API was enhanced in 10.2 (see:
http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html
<http://www.cocoabuilder.com/archive/cocoa/73297-working-with-32-bit-unicode-nsstring-stringwithutf32string-const-utf32char-bytes-needed.html>,
for some tantalizing hints about NSCharacterSet), the implementation was a
hack that works somehow but isn’t documented. I don’t think you’re going to get
any definitive answer except directly from Apple.
A suggestion, though:
Try building your character set using ‘characterSetWithRange:’ and/or the
NSMutableCharacterSet methods that add ranges, instead of using NSStrings.
Maybe NSCharacterSet really is UTF-32-based, but not — for code compatibility
reasons — when using NSStrings explicitly.
_______________________________________________
Cocoa-dev mailing list ([email protected])
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com
This email sent to [email protected]