[
https://issues.apache.org/jira/browse/HADOOP-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601808#action_12601808
]
Ted Dunning commented on HADOOP-3481:
-------------------------------------
-1 overall
The idea for the patch is excellent. Some of the code could be better.
For instance, this is not a good thing:
+ } catch (Exception e) {
+ // nop
+ }
Firstly, one should never catch "Exception". Secondly, if you catch an
exception, you should do something about. I don't think it is acceptable to
silently substitute a different character encoding. Instead, there should be a
fatal error.
For another, it looks like you are decoding the characters explicitly instead
of just setting teh encoding on the input reader. Am I missing something?
> TextInputFormat should support character encoding settings
> ----------------------------------------------------------
>
> Key: HADOOP-3481
> URL: https://issues.apache.org/jira/browse/HADOOP-3481
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.17.0
> Environment: Windows XP SP3
> Reporter: NOMURA Yoshihide
>
> I need to read text files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
> I suggest the TextInputFormat to support encoding settings like this.
> conf.set("io.file.defaultEncoding", "MS932");
> I will submit a patch candidate.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.