[
https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670669#action_12670669
]
Owen O'Malley commented on HADOOP-1722:
---------------------------------------
I'd suggest that we generalize this a little bit more and make the
MRInputWriter and MROutputWriter take the input and output stream. And then
instead of:
stream.map.input.typed.bytes=true -> stream.map.input=typed.bytes
So that if someone wants to add another encoder it is trivial to do so.
At some point, we probably should make the AutoInputFormat more complete and
promote it into mapreduce.lib, but clearly that can happen in a different patch.
I'm not that convinced that people will use the typed bytes format outside of
python, where the library is already present, but it does give them an option,
which is currently not there.
> Make streaming to handle non-utf8 byte array
> --------------------------------------------
>
> Key: HADOOP-1722
> URL: https://issues.apache.org/jira/browse/HADOOP-1722
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Runping Qi
> Assignee: Klaas Bosteels
> Attachments: HADOOP-1722-v2.patch, HADOOP-1722-v3.patch,
> HADOOP-1722-v4.patch, HADOOP-1722-v4.patch, HADOOP-1722.patch
>
>
> Right now, the streaming framework expects the output sof the steam process
> (mapper or reducer) are line
> oriented UTF-8 text. This limit makes it impossible to use those programs
> whose outputs may be non-UTF-8
> (international encoding, or maybe even binary data). Streaming can overcome
> this limit by introducing a simple
> encoding protocol. For example, it can allow the mapper/reducer to hexencode
> its keys/values,
> the framework decodes them in the Java side.
> This way, as long as the mapper/reducer executables follow this encoding
> protocol,
> they can output arabitary bytearray and the streaming framework can handle
> them.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.