[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934644#action_12934644 ]
Owen O'Malley commented on HADOOP-6685: --------------------------------------- {quote} The first is that no change is needed in SequenceFile unless we want to support Avro, but, given that Avro data files were designed for this, and are multi-lingual, why change the SequenceFile format solely to support Avro? Are Avro data files insufficient? Note that Thrift and Protocol Buffers can be stored in today's SequenceFiles. {quote} This isn't true. SequenceFile needs to be changed to support the new serialization API. The class name isn't sufficient to determine the serialization. Furthermore, you can't implement context sensitive serializations MAPREDUCE-1462 without the changes to SequenceFile. {quote} Are Avro data files insufficient? {quote} Yes. They don't support indices. They don't support key, value pairs. They don't support other types like Writables. Furthermore, our users already heavily use SequenceFiles and don't want to port to a new file format. Extending SequenceFile gives them more flexibility. {quote} I wonder if JSON might be a good nestable format for serialization metadata? JSON supports nesting, and distinguishes numeric, boolean and string types. With Jackson, one can serialize and deserialize Java objects as JSON, to get compile-time type checking. {quote} In MAPREDUCE-980, you took out the custom JSON parser and replaced it with calls into Avro. Using ProtoBuf is efficient and meant that I wrote 2 lines of code. If I used JSON, I would need to write a parser and printer. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.