[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789453#action_12789453
 ] 

Zheng Shao commented on MAPREDUCE-1277:
---------------------------------------

Hadoop does need to understand the data format in stdout to split the records 
and key/value inside the record.
By default, Hadoop streaming uses utf-8, "\n" and "\t".

For stderr, Hadoop needs to know the line boundary  "\n" as well. Hadoop 
already supports reporting (change of counters etc) through stderr.

As a result, I think it's a better idea to specify the encoding of the streams.


> Streaming job should support other characterset in user's stderr log, not 
> only utf8
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1277
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.21.0
>            Reporter: ZhuGuanyin
>            Assignee: ZhuGuanyin
>             Fix For: 0.21.0
>
>         Attachments: streaming-1277.patch
>
>
> Current implementation in streaming  only support utf8 encoded user stderr 
> log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to