Encoding Hell

Bruno Abitbol Fri, 18 Dec 2009 05:50:19 -0800

Hi,

I have been playing around for two days trying to figure out an issue
related to the default charset:



   - When I run a very dummy job which just displays the default charset on
   hadoop using the pseudo connected mode, I obtain US-ASCII. When I display
   the java property file.encoding I obtain ANSI_X3.4-1968


   - When I run the same job under Eclipse in locale mode I obtain UTF-8
   (which is the one I expect).

I use a Linux Gentoo distribution, the locale env variables are the
following:

LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-
8"
LC_ALL=en_GB.UTF-8

I have tried to set the file.encoding property to UTF-8 but it doesn't work.
Any help would be greatly appreciated.

Thank you.



-- 
Bruno Abitbol
[email protected]
http://www.jobomix.fr

Encoding Hell

Reply via email to