[ 
https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668270#action_12668270
 ] 

Runping Qi commented on HADOOP-1722:
------------------------------------


I don't understand your reasoning about the format for the map output does not 
necessarily have to be the same as the format for the reduce input.
Why the programmer would want the streaming framework to convert the typed 
bytes outputted by a mapper to strings before passing them on to a reducer?
If the programmer wants that, why doesn't the mapper generate text output at 
the first place?

Another issue: it is not uncommon that the mapper output is in text format and 
the reducer output is in binary format (typed bytes).
Your last change of the semantics cannot express this case.



> Make streaming to handle non-utf8 byte array
> --------------------------------------------
>
>                 Key: HADOOP-1722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1722
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>            Assignee: Klaas Bosteels
>         Attachments: HADOOP-1722-v2.patch, HADOOP-1722-v3.patch, 
> HADOOP-1722-v4.patch, HADOOP-1722.patch
>
>
> Right now, the streaming framework expects the output sof the steam process 
> (mapper or reducer) are line 
> oriented UTF-8 text. This limit makes it impossible to use those programs 
> whose outputs may be non-UTF-8
>  (international encoding, or maybe even binary data). Streaming can overcome 
> this limit by introducing a simple
> encoding protocol. For example, it can allow the mapper/reducer to hexencode 
> its keys/values, 
> the framework decodes them in the Java side.
> This way, as long as the mapper/reducer executables follow this encoding 
> protocol, 
> they can output arabitary bytearray and the streaming framework can handle 
> them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to