On Jul 7, 2008, at 11:19 AM, Michael Ash wrote:

3) Look for a clean break in the UTF-8 sequence. This is not as
difficult as it sounds. There are two easy scenarios where you can
break. The first is after any ASCII character. You can scan your
NSMutableData buffer for any char value <= 127, and break at that
location. Second, you can break *before* any char value that matches
this mask:

    c & 0xA == 0xA

This will find a char whose first two bits are both 1. In UTF-8, this
denotes the first character in a multi-byte sequence, so you know that
if you break right before that location, it's a safe place.


A hexadecimal digit represents a nybble (4 bits), or half a byte. Getting the highest-order 2 digits of a byte would be 0xA0--whoops it would be 0xC0. (A == 10 == 1010b, C == 12 == 1100b.) So looking at <http://en.wikipedia.org/wiki/UTF-8>:

c & 0xC0 == 0xC0 -> you have the 1st char of a multi-byte sequence
c & 0xC0 == 0x80 -> you have a later char of a multi-byte sequence
         -> back up until you find a char of the 1st case
otherwise -> you have a one-byte char

--
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to