Re: The length of strings vs. # of chars vs. sizeof

Charles Hixson Tue, 03 Nov 2009 18:10:31 -0800

Bill Baxter wrote:

On Tue, Nov 3, 2009 at 2:47 AM, rmcguire<[email protected]>  wrote:

Charles Hixson<[email protected]>  wrote:

Jesse Phillips wrote:

On Sun, 01 Nov 2009 11:36:31 -0800, Charles Hixson wrote:

Does anyone just *know* the answer?  (And if so, could they make the
documentation explicit?)


I believe the documentation you are looking for is:

http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD

It is more about understanding UTF than it is about learning strings.

Thanks, that does appear to be the answer.

So if a string is too long, and I shorten it by one character, I'd
better test it with std.utf.validate(str).  If it doesn't throw an
error, it's ok.  Otherwise shorten it again and retry.

I hope I understood this correctly.  (I'm sure there's a more elegant
way to do this, but here I'm going for a simple approach, as I should
rarely be encountering this problem.)

As far as I know if you want to shorten a utf8 string you just check the
first bit of the last byte to see if its 0. If its 0 go back further
until you find a byte that starts with 1, and then remove that byte too.

All characters start with a byte that starts with 1, the number of 1s in
the first byte of the character tell you how many bytes in the character.

Hope that helps, but you should find a library that already has a
"shorten my string" function.


It's explained well in Andrei's book.
0* -- single byte character
11* -- first byte of multi-byte char
10* -- subsequent byte of multi-byte char

--bb

Thanks.  That's a much better answer.

Re: The length of strings vs. # of chars vs. sizeof

Reply via email to