[jira] Commented: (HADOOP-1722) Make streaming to handle non-utf8 byte array

arkady borkovsky (JIRA) Wed, 29 Aug 2007 10:48:55 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523638
 ]


arkady borkovsky commented on HADOOP-1722:
------------------------------------------

Passing data from from DFS to streaming mapper should be transparent,
By default, the mapper task should receive the exactly the same bytes as stored 
in DFS without any transformation.
There should  also be command line parameters that specify other useful 
options, including custom input format, decompressions, etc.
There should be no requirements on the command that is used as Streaming Mapper.

This has been broken twice -- in Sept. 2006, and in July 2007.
It would be nice to restore the functionality, and make it part of 
specification.  (This implies adding regression cases, etc.)

> Make streaming to handle non-utf8 byte array
> --------------------------------------------
>
>                 Key: HADOOP-1722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1722
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>
> Right now, the streaming framework expects the output sof the steam process 
> (mapper or reducer) are line 
> oriented UTF-8 text. This limit makes it impossible to use those programs 
> whose outputs may be non-UTF-8
>  (international encoding, or maybe even binary data). Streaming can overcome 
> this limit by introducing a simple
> encoding protocol. For example, it can allow the mapper/reducer to hexencode 
> its keys/values, 
> the framework decodes them in the Java side.
> This way, as long as the mapper/reducer executables follow this encoding 
> protocol, 
> they can output arabitary bytearray and the streaming framework can handle 
> them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1722) Make streaming to handle non-utf8 byte array

Reply via email to