[go-nuts] understanding utf-8 for a newbie

rob solomon Fri, 05 May 2017 18:16:57 -0700

Hi. I decided to write a small program in Go to convert utf8 to simpleASCII. This need arose by my copying a file created in Ubuntu 16.04amd64, and used on a win10 computer.

I decided to first change ", ' and emdash characters. Using hexdump -Cin Ubuntu, the runes in the file are:


open quote = 0xE2809C

close quote = 0xE2809D

apostrophe = 0xE28099

emdash = 0xE28094

However, when I write a simple program to display these runes from thefile, using the routines in unicode/utf8, I get very different values.I do not understand this.


open quote = 0x201C

close quote = 0x201D

apostrophe = 0x2019

emdash = 0x2014.

Why are the runes returned by utf8.DecodeRuneInString different fromwhat hexdump shows when inspecting the file directly?


--rob solomon

--
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] understanding utf-8 for a newbie

Reply via email to