[jira] Commented: (HADOOP-3481) TextInputFormat should support character encoding settings

Ted Dunning (JIRA) Mon, 02 Jun 2008 18:37:10 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601808#action_12601808
 ]


Ted Dunning commented on HADOOP-3481:
-------------------------------------

-1 overall

The idea for the patch is excellent.  Some of the code could be better.


For instance, this is not a good thing:

+       } catch (Exception e) {
+         // nop
+       }

Firstly, one should never catch "Exception".  Secondly, if you catch an 
exception, you should do something about.  I don't think it is acceptable to 
silently substitute a different character encoding.  Instead, there should be a 
fatal error.


For another, it looks like you are decoding the characters explicitly instead 
of just setting teh encoding on the input reader.  Am I missing something?



> TextInputFormat should support character encoding settings
> ----------------------------------------------------------
>
>                 Key: HADOOP-3481
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3481
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: Windows XP SP3
>            Reporter: NOMURA Yoshihide
>
> I need to read text files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
> I suggest the TextInputFormat to support encoding settings like this.
>   conf.set("io.file.defaultEncoding", "MS932");
> I will submit a patch candidate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3481) TextInputFormat should support character encoding settings

Reply via email to