Text file character encoding

NOMURA Yoshihide Sun, 01 Jun 2008 23:20:28 -0700

Hello,
I'm using Hadoop 0.17.0 to analyze some large amount of CSV files.

And I need to read such files in different character encoding from UTF-8,
but I think TextInputFormat doesn't support such character encoding.


I guess LineRecordReader class or Text class should support encoding
settings like this.
 conf.set("io.file.defaultEncoding", "MS932");

Is there any plan to supoort different character encoding in
TextInputFormat?

Regards,
-- 
NOMURA Yoshihide:
    Software Innovation Laboratory, Fujitsu Labs. Ltd., Japan
    Tel: 044-754-2675 (Ext: 7112-6358)
    Fax: 044-754-2570 (Ext: 7112-3834)
    E-Mail: [EMAIL PROTECTED]

Text file character encoding

Reply via email to