On Apr 6, 2015, at 16:29 , pscott <psc...@skycoast.us> wrote:
> 
> But what you were describing *would* be UCS-2. To claim UTF-16 support, 
> variable length encoding must be handled.

It’s pretty much understood — on this list — that NSString is based on UTF-16, 
so we tend to cut the corner that’s bothering you. This is complicated by the 
fact that NSString is a bit weird. Its underlying representation is UTF-16 
strings, but its API is "array of UTF-16 code units”. That means you can create 
an invalid UTF-16 string with the NSString API. The fact that we’re not 
supposed to do that is also pretty much understood.

This messiness, along with the use of the ambiguous word “character” or 
“Unicode character” in the documentation, is all for historical reasons.

NSCharacterSet is something else again. We don’t actually know whether:

— it’s implemented as a set of UTF-16 code units, instead of code points

— it handles UTF-16 surrogate pairs properly, in which of its API methods

— it handles UTF-32 code units properly, in which of its API methods

— it has bugs that prevent it from doing what it’s intended to do, whatever 
that is

Greg has basically given us the answers: “not code units”, “possibly”, “it’s 
supposed to”, and “probably”. :)



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to