[ 
https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667346#action_12667346
 ] 

Runping Qi commented on HADOOP-1722:
------------------------------------


Looks good.
A couple things.
1. The type flags: The user may need to specify  two different output type 
flags, one for the map output, and the other is for the reducer output.
2. The reduce input type flag should always be the same as the map output flag, 
and thus it is completely independent of the input type flag for the mapper
3. Since the mapper/reducer may be implemented in other languages, such as C, 
we must document the serialization format for the TypedBytesWritable clearly in 
a language agnostic way. It will be great to have a library for the 
serialization/deserialization for each common languages.


> Make streaming to handle non-utf8 byte array
> --------------------------------------------
>
>                 Key: HADOOP-1722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1722
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>            Assignee: Christopher Zimmerman
>         Attachments: HADOOP-1722.patch
>
>
> Right now, the streaming framework expects the output sof the steam process 
> (mapper or reducer) are line 
> oriented UTF-8 text. This limit makes it impossible to use those programs 
> whose outputs may be non-UTF-8
>  (international encoding, or maybe even binary data). Streaming can overcome 
> this limit by introducing a simple
> encoding protocol. For example, it can allow the mapper/reducer to hexencode 
> its keys/values, 
> the framework decodes them in the Java side.
> This way, as long as the mapper/reducer executables follow this encoding 
> protocol, 
> they can output arabitary bytearray and the streaming framework can handle 
> them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to