On Oct 18, 2010, at 10:19, Alex Kac wrote:

> What we are trying to do:
> Shorten the AM/PM to just the first character in Western Languages so that a 
> time is shown as "1:30a". 
> 
>       NSDateFormatter* formatter = [[NSDateFormatter alloc] init];
>       NSString* am = [[[formatter AMSymbol] substringToIndex:1] 
> lowercaseString];
>       NSString* pm = [[[formatter PMSymbol] substringToIndex:1] 
> lowercaseString];
> 
> 
> This works in Western languages just fine. However in languages like Korean 
> it does not work giving a random character seemingly. From reading on this 
> list over time I believe its because I'm just getting one part of a 
> multi-part character (I'm no good with unicode terms sorry). 
> 
> My guess is I need to use rangeOfComposedCharacterSequenceAtIndex and then 
> get the range and use a substring with that range. But I'm not sure since my 
> knowledge here is pretty limited.

This description seems pretty good (and short):

        
http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/stringsClusters.html

Basically, there are several nested levels of complexity:

1. UTF-16 units (which are the 16 bit values that are indexed by NSString's 
'...AtIndex:' methods)

2. Unicode code points (which are UTF-16 units or surrogate pairs of UTF-16 
units)

3. Composed characters (such as accented characters) made up of pairs of 
Unicode code points

4. Grapheme clusters, which are sequences of Unicode code points representing 
things that are written as a single unit (in some sense, depending on the 
language)

5. Related character sequences (I don't know there's an official name for this) 
such as German 'ß' and 'SS' that figure into algorithms for sorting and case 
changing.

According to the above-linked page, #3 and #4 aren't really different.

Also according to the above-linked page, 
'rangeOfComposedCharacterSequenceAtIndex:' does sound like the method to use. 

It's not obvious that taking the first grapheme is going to be semantically 
meaningful in every language (for example, if the English abbreviations 
happened to be MA and MP, taking the first grapheme wouldn't help you -- the 
assumption that the first character distinguishes the time range is not 
necessarily valid across all languages), but at least it's not going to give 
you an unrelated character.


_______________________________________________

Cocoa-dev mailing list ([email protected])

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to