[jira] [Commented] (MAPREDUCE-377) Add serialization for Protocol Buffers

Josh Hansen (JIRA) Thu, 21 Feb 2013 16:12:15 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583715#comment-13583715
 ]


Josh Hansen commented on MAPREDUCE-377:
---------------------------------------

writeDelimitedTo(OutputStream), mergeDelimitedFrom(InputStream), and 
parseDelimitedFrom(InputStream) have all made it into the standard Protocol 
Buffers library now. See 
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite#writeDelimitedTo(java.io.OutputStream)
 . That should resolve one obvious obstacle to addressing this issue.

There were questions a few years ago about whether this issue is still 
relevant; I'm with Tom White that it's very relevant for people who want to use 
their protobuf data in Hadoop MapReduce. Avro in particular doesn't meet the 
needs of my organization due to its lack of a sparse representation.

Twitter's elephant-bird library (https://github.com/kevinweil/elephant-bird) 
provides some protobuf-in-Hadoop support, but it's less than obvious how to use 
it with protobufs that are not LZO-compressed.
                
> Add serialization for Protocol Buffers
> --------------------------------------
>
>                 Key: MAPREDUCE-377
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-377
>             Project: Hadoop Map/Reduce
>          Issue Type: Wish
>            Reporter: Tom White
>            Assignee: Alex Loddengaard
>         Attachments: hadoop-3788-v1.patch, hadoop-3788-v2.patch, 
> hadoop-3788-v3.patch, protobuf-java-2.0.1.jar, protobuf-java-2.0.2.jar
>
>
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding 
> data in a compact binary format. This issue is to write a 
> ProtocolBuffersSerialization to support using Protocol Buffers types in 
> MapReduce programs, including an example program. This should probably go 
> into contrib. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-377) Add serialization for Protocol Buffers

Reply via email to