On Mar 21, 2013, at 6:05 PM, Andrew Thompson <[email protected]> wrote:

> 
> 
> On Mar 21, 2013, at 2:10 PM, Aki Inoue <[email protected]> wrote:
> 
>> For that matter, UTF-32 (aka UCS-4) is not safe to find the truncation 
>> boundary just at the 4-byte boundary.
> 
> You're thinking of combining marks here?
Yes.

> It's generally claimed that one can multiply character offsets by 4 to index 
> into UCS-4 data… which I think I now see is only true depending on your 
> definition of character; i.e whether one considers a decomposed sequence to 
> be one character or two.

> I see how truncation would be unsafe because you'd chop off the accents etc?
Yes.

Aki

> 
> 


_______________________________________________

Cocoa-dev mailing list ([email protected])

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to