Re: [Mono-list] ASCII bytes to string?

Jonathan Pryor Thu, 10 Jan 2013 12:41:00 -0800

On Jan 10, 2013, at 1:28 PM, mickeyf <[email protected]> wrote:
> The string itself displays as expected, but shows a length of twice the 
> number of characters, as if String.Length is reporting the number of bytes 
> (UTF16) rather than the number of Unicode characters in the string.


In all likelihood, the string contains non-printable characters. Consider this 
`csharp` snippet:

        csharp> var b = new byte[]{(byte) 'a', (byte) 'b', 0, 0, 0, 0};
        csharp> var s = System.Text.Encoding.UTF8.GetString(b);
        csharp> s.Length
        6
        csharp> s;
        "ab"

So this is more or less exactly what you're describing; `s` _clearly_ has two 
characters, yet s.Length is 6!

Except `s` doesn't have two characters:

        csharp>  [3];
        '\x0

There's some null data in there, because our source byte array contained null 
bytes, and System.String can contain ASCII NUL characters, which `b` contains.

You can confirm/deny this by seeing that `buffFromDrv` actually contains, and 
see if it has any non-printable data (e.g. ASCII NUL).

Assuming that's the case, what you need to do is not convert "extra" data:

        byte[] buffFromDrv = new byte [BIG_ENOUGH];
        int bytesRead = stream.Read(buffFromDrv, readPosition, bytesToRead);
        string s = System.Text.UTF8Encoding.UTF8.GetString(buffFromDrv, 0, 
bytesRead);

Or for the above `csharp` snippet:

        csharp> var s = System.Text.Encoding.UTF8.GetString(b, 0, 2);
        csharp> s;
        "ab"
        csharp> s.Length;
        2

> The documentation for string.length says "number of characters", not "number 
> of bytes",

It's actually neither; String.Length is the number of UTF-16 "code units" 
stored in the string. This is _not_ the number of "characters" ("code points"), 
because a code point may require the use of a "surrogate pair", in which case 
it will take up two `char` values within the string:

        http://en.wikipedia.org/wiki/UTF-16

(Normally you don't need to care about this, except when you do...)

 - Jon

_______________________________________________
Mono-list maillist  -  [email protected]
http://lists.ximian.com/mailman/listinfo/mono-list

Re: [Mono-list] ASCII bytes to string?

Reply via email to