Hello,

Strings are encoded using UTF-8, which is a multi-byte encoding.  Different 
runes require different lengths to be encoded, and the prefix you noted is 
how that length is transmitted (although the ranges in your message don't 
seem to be correct).

Robert


On Tuesday, 28 February 2017 12:29:07 UTC-5, Fraser Hanson wrote:
>
> https://play.golang.org/p/05wZM9BhfB
>
> I'm working on some code that reads UTF32 and converts it to go strings. 
> I'm finding some surprising behavior when casting slices of runes to 
> strings.
>
>  runes := []rune{'©'}
>  fmt.Printf(" cast to string: (%s)\n", string(runes))
>  fmt.Printf("bytes in string: (%x)\n", string(runes))
> Output:
>
>  cast to string: (©)
> bytes in string: (c2a9) // <-- where's the C2 byte coming from??
>
>
> The weird part is that casting the rune slice to a string causes it to 
> pick up an additional leading character. 
>
> runesi 0x00-0x7f get nothing prepended.
> runes 0x80-0xbf gets a leading c2 byte as seen above.
> runes 0xc0-0xff gets a leading c3 byte.
> rune 0x100 gets a leading c4 byte.  Seems like a pattern here.
>
> The same thing happens if I add the runes into a bytes.Buffer with 
> WriteRune(), then print it out with bytes.Buffer.String().
>
> Can anyone explain this?  
> What's the correct way to convert a slice of runes into a string?
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to