[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934523#action_12934523 ]
Doug Cutting commented on HADOOP-6685: -------------------------------------- > I question the validity of Doug's veto. His objection to the patch has > nothing to do with the merits of the patch and everything to do with his wish > to push Avro into Hadoop at the cost of the users. I have withdrawn my hope to add Avro as a data format into Hadoop, since the Avro project now already provides a Hadoop data format layered on Hadoop. To my knowledge there are currently no Hadoop issues pushing Avro into Hadoop as a data format besides this issue, and I do not currently intend to file any new such Hadoop issues. (Avro's layer would be more easily implemented if the shuffle better supported non-Writable data, but, as it stands, is adequate.) We should refrain from adding any new data formats to the Hadoop kernel. More generally, we should refrain from adding code that could be implemented as user code to the kernel. At present, the kernel must contain some framework code that runs in a user's tasks, e.g., sorting code that calls the user's comparator. Beyond that required framework code however, code that runs in user tasks should not be provided with the system, but should rather be supplied by the user. User tasks should ideally be able to, e.g., run a different version of the HDFS client code. We have a fair amount of legacy code, like SequenceFile, that is currently provided with the system, that we cannot immediately remove for compatibility reasons. But new user-level functionality should be provided as external packages, not provided with the kernel. If we wish to enhance the SequenceFile data format, then that should be done in a separate project. The line between user and system code is currently blurred and we should work to clarify it and reduce the amount of user code in this project, providing a level playing field for user code libraries. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.