[ https://issues.apache.org/jira/browse/HADOOP-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
NOMURA Yoshihide updated HADOOP-3481: ------------------------------------- Attachment: Hadoop-3481.patch This is updated patch. In this patch, the LineReader class extract encoding setting, and the Text class decode specified charset. And also, the simple test class is added. I think the thread of LineReader constructor and readLine() method are always same. Is that right? > TextInputFormat should support character encoding settings > ---------------------------------------------------------- > > Key: HADOOP-3481 > URL: https://issues.apache.org/jira/browse/HADOOP-3481 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Affects Versions: 0.17.0 > Environment: Windows XP SP3 > Reporter: NOMURA Yoshihide > Attachments: Hadoop-3481.patch > > > I need to read text files in different character encoding from UTF-8, > but I think TextInputFormat doesn't support such character encoding. > I suggest the TextInputFormat to support encoding settings like this. > conf.set("io.file.defaultEncoding", "MS932"); > I will submit a patch candidate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.