[
https://issues.apache.org/jira/browse/MAPREDUCE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583715#comment-13583715
]
Josh Hansen commented on MAPREDUCE-377:
---------------------------------------
writeDelimitedTo(OutputStream), mergeDelimitedFrom(InputStream), and
parseDelimitedFrom(InputStream) have all made it into the standard Protocol
Buffers library now. See
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite#writeDelimitedTo(java.io.OutputStream)
. That should resolve one obvious obstacle to addressing this issue.
There were questions a few years ago about whether this issue is still
relevant; I'm with Tom White that it's very relevant for people who want to use
their protobuf data in Hadoop MapReduce. Avro in particular doesn't meet the
needs of my organization due to its lack of a sparse representation.
Twitter's elephant-bird library (https://github.com/kevinweil/elephant-bird)
provides some protobuf-in-Hadoop support, but it's less than obvious how to use
it with protobufs that are not LZO-compressed.
> Add serialization for Protocol Buffers
> --------------------------------------
>
> Key: MAPREDUCE-377
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-377
> Project: Hadoop Map/Reduce
> Issue Type: Wish
> Reporter: Tom White
> Assignee: Alex Loddengaard
> Attachments: hadoop-3788-v1.patch, hadoop-3788-v2.patch,
> hadoop-3788-v3.patch, protobuf-java-2.0.1.jar, protobuf-java-2.0.2.jar
>
>
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding
> data in a compact binary format. This issue is to write a
> ProtocolBuffersSerialization to support using Protocol Buffers types in
> MapReduce programs, including an example program. This should probably go
> into contrib.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira