On Tue, Nov 3, 2009 at 2:47 AM, rmcguire <[email protected]> wrote: > Charles Hixson <[email protected]> wrote: > >> Jesse Phillips wrote: >>> On Sun, 01 Nov 2009 11:36:31 -0800, Charles Hixson wrote: >>> >>>> Does anyone just *know* the answer? (And if so, could they make the >>>> documentation explicit?) >>> >>> I believe the documentation you are looking for is: >>> >>> http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD >>> >>> It is more about understanding UTF than it is about learning strings. >> Thanks, that does appear to be the answer. >> >> So if a string is too long, and I shorten it by one character, I'd >> better test it with std.utf.validate(str). If it doesn't throw an >> error, it's ok. Otherwise shorten it again and retry. >> >> I hope I understood this correctly. (I'm sure there's a more elegant >> way to do this, but here I'm going for a simple approach, as I should >> rarely be encountering this problem.) >> >> > As far as I know if you want to shorten a utf8 string you just check the > first bit of the last byte to see if its 0. If its 0 go back further > until you find a byte that starts with 1, and then remove that byte too. > > All characters start with a byte that starts with 1, the number of 1s in > the first byte of the character tell you how many bytes in the character. > > Hope that helps, but you should find a library that already has a > "shorten my string" function.
It's explained well in Andrei's book. 0* -- single byte character 11* -- first byte of multi-byte char 10* -- subsequent byte of multi-byte char --bb
