[
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789453#action_12789453
]
Zheng Shao commented on MAPREDUCE-1277:
---------------------------------------
Hadoop does need to understand the data format in stdout to split the records
and key/value inside the record.
By default, Hadoop streaming uses utf-8, "\n" and "\t".
For stderr, Hadoop needs to know the line boundary "\n" as well. Hadoop
already supports reporting (change of counters etc) through stderr.
As a result, I think it's a better idea to specify the encoding of the streams.
> Streaming job should support other characterset in user's stderr log, not
> only utf8
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.21.0
> Reporter: ZhuGuanyin
> Assignee: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: streaming-1277.patch
>
>
> Current implementation in streaming only support utf8 encoded user stderr
> log, it should encode free to support other characterset.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.