Bill Baxter <[email protected]> wrote: > On Tue, Nov 3, 2009 at 2:47 AM, rmcguire <[email protected]> wrote: >> Charles Hixson <[email protected]> wrote: >> >>> Jesse Phillips wrote: >>>> On Sun, 01 Nov 2009 11:36:31 -0800, Charles Hixson wrote: >>>> >>>>> Does anyone just *know* the answer? (And if so, could they make the >>>>> documentation explicit?) >>>> >>>> I believe the documentation you are looking for is: >>>> >>>> http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD >>>> >>>> It is more about understanding UTF than it is about learning strings. >>> Thanks, that does appear to be the answer. >>> >>> So if a string is too long, and I shorten it by one character, I'd >>> better test it with std.utf.validate(str). If it doesn't throw an >>> error, it's ok. Otherwise shorten it again and retry. >>> >>> I hope I understood this correctly. (I'm sure there's a more elegant >>> way to do this, but here I'm going for a simple approach, as I should >>> rarely be encountering this problem.) >>> >>> >> As far as I know if you want to shorten a utf8 string you just check the >> first bit of the last byte to see if its 0. If its 0 go back further >> until you find a byte that starts with 1, and then remove that byte too. >> >> All characters start with a byte that starts with 1, the number of 1s in >> the first byte of the character tell you how many bytes in the character. >> >> Hope that helps, but you should find a library that already has a >> "shorten my string" function. > > It's explained well in Andrei's book. > 0* -- single byte character > 11* -- first byte of multi-byte char > 10* -- subsequent byte of multi-byte char > > --bb > :) forgot about that, its been a while since I played with utf8.
made a Hessian serializer in C. -Rory
