[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934030#action_12934030 ]
Doug Cutting commented on HADOOP-6685: -------------------------------------- > There is petabytes of data in SequenceFile format in Hadoop clusters > everywhere. We cannot drop it, we need to maintain it and keep it up to date. > We also need to improve to continue to support existing users. I have never proposed dropping SequenceFile. I have proposed that we not extend it. I have proposed that if we introduce a new concrete binary object data file format (container+serialization) then we should only introduce a single such second-generation format. If we cannot agree on such a format, then we will be stuck adding no new formats to the kernel but rather creating new formats in external projects. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.