Hi, I have been playing around for two days trying to figure out an issue related to the default charset:
- When I run a very dummy job which just displays the default charset on hadoop using the pseudo connected mode, I obtain US-ASCII. When I display the java property file.encoding I obtain ANSI_X3.4-1968 - When I run the same job under Eclipse in locale mode I obtain UTF-8 (which is the one I expect). I use a Linux Gentoo distribution, the locale env variables are the following: LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF- 8" LC_ALL=en_GB.UTF-8 I have tried to set the file.encoding property to UTF-8 but it doesn't work. Any help would be greatly appreciated. Thank you. -- Bruno Abitbol [email protected] http://www.jobomix.fr
