I18n, UTF-8, and Linux

Gil Glass Fri, 21 Oct 2005 08:07:02 -0700

Hello,

I am new to this list so please forgive me I'm covering old ground.

I am interested in displaying some text in languages other than English within my application. However, I'm having some difficulty when trying to display non-ASCII characters. Note that I use UTF-8 to display all characters, even those that can be represented in 8 bits (0x00 - 0xFF).

For example, if I want to display the character 'á' (that's an 'a' with an acute accent in case it doesn't show up on your browser), that's U+00E1 in Unicode-speak. Encoding that character as UTF-8, it comes out to be 0xC3 0xA1. If, in my .po file (for the GNU gettext() utilities), I include the following:

"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

...

#: TestProgram.cpp:145
msgid "it is"
msgstr "est\xC3\xA1"

what comes back is

est?

I know that the problem is not with text rendering as I can write the UTF-8 directly into the string in the program and it works fine, i.e. it displays the a with the accent.

Any ideas of what I might be doing wrong? Note that I also tried typing the C3 and A1 characters directly (Ã¡) but that also doesn't work.

ANOTHER PROBLEM: If I want to display the word "mañana" for example, I would encode it as "ma\xC3\xB1ana". However, the "\xB1a" is considered to be a single hex number! How can I indicate that I want the byte \xB1 followed by the letter 'a'. Remember, I can't use formatting strings because I'm working with gettext(). Surely somebody has run into this before!!

Thanks in advance.

Cheers,
Gil Glass
Telecom Field Services
JDSU
Germantown, MD, USA
+1-240-404-2551

I18n, UTF-8, and Linux

Reply via email to