[
https://issues.apache.org/jira/browse/HADOOP-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628035#action_12628035
]
Tom White commented on HADOOP-3788:
-----------------------------------
Alex, Thanks for looking at this.
It shouldn't be necessary to create a new Writable implementation for each
protoc-generated class (if that's what you are suggesting). By writing a
ProtocolBuffersSerialization it should be possible to avoid having to use
Writables at all.
I imagined that the implementation of ProtocolBuffersSerializer would create a
CodedOutputStream in the open method, then call Message#writeTo with the
CodedOutputStream in the serialize method. ProtocolBuffersDeserializer is a bit
more tricky. It would find the com.google.protobuf.Descriptors.Descriptor for
the message class being deserialized, then use DynamicMessage#parseFrom to
construct a message from the descriptor and the input stream.
To test this you could write some PB types to a Hadoop sequence file, then
write a MapReduce program to process it and write it out to another sequence
file containing PB types. See HADOOP-3787.
> Add serialization for Protocol Buffers
> --------------------------------------
>
> Key: HADOOP-3788
> URL: https://issues.apache.org/jira/browse/HADOOP-3788
> Project: Hadoop Core
> Issue Type: Wish
> Components: examples, mapred
> Reporter: Tom White
> Assignee: Alex Loddengaard
>
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding
> data in a compact binary format. This issue is to write a
> ProtocolBuffersSerialization to support using Protocol Buffers types in
> MapReduce programs, including an example program. This should probably go
> into contrib.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.