[
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932829#action_12932829
]
Doug Cutting commented on HADOOP-6685:
--------------------------------------
> you'd like Avro to be the one and only serialization format that Hadoop
> supports
Not quite. I'd like Hadoop to encourage a primary persistent data file format,
to better enable a rich ecosystem. Currently we promote Writables in
SequenceFiles. The Avro project was launched with the goal of providing a
second-generation alternative to this. But if there's another candidate that
would serve better I'd gladly entertain it. The success of Avro is secondary
to this goal.
Neither Thrift nor ProtocolBuffers define a standard data file format. If
someone implemented a cross-platform container file format based on Protocol
Buffers or Thrift that supported compression and was splittable, it would be a
strong contender. Avro's primary advantage over these is that it's more
dynamic, e.g., supporting easy creation of new datatypes without a code
generate/load cycle, but Thrift and Protocol Buffers are more mature and
support more programming languages, so could present compelling alternatives if
they offered an appropriate file format.
We should not force a single persistent data format. We should continue to
include flexible APIs that permit arbitrary data formats.
> Change the generic serialization framework API to use serialization-specific
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-6685
> URL: https://issues.apache.org/jira/browse/HADOOP-6685
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: libthrift.jar, serial.patch, serial4.patch,
> serial6.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for
> the serialization specific configuration. Since this data is really internal
> to the specific serialization, I think we should change it to be an opaque
> binary blob. This will simplify the interface for defining specific
> serializations for different contexts (MAPREDUCE-1462). It will also move us
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.