[VOTE] Direction for Hadoop development

Owen O'Malley Mon, 29 Nov 2010 14:31:22 -0800

All,

Based on the discussion on HADOOP-6685, there is a prettyfundamental difference of opinion about how Hadoop should evolve. Weneed to figure out how the majority of the PMC wants the project toevolve to understand which patches move us forward. Please votewhether you approve of the following direction. Clearly as the author,I'm +1.


-- Owen

Hadoop has always included library code so that users had a strongfoundation to build their applications on without needing tocontinually reinvent the wheel. This combination of framework andpowerful library code is a common pattern for successful projects,such as Java, Lucene, etc. Toward that end, we need to continue toextend the Hadoop library code and actively maintain it as theframework evolves. Continuing support for SequenceFile and TFile,which are both widely used is mandatory. The opposite pattern ofimplementing the framework and letting each distribution add therequired libraries will lead to increased community fragmentation andvendor lock in.

Hadoop's generic serialization framework had a lot of promise when itwas introduced, but has been hampered by a lack of plugins other thanWritables and Java serialization. Supporting a wide range ofserializations natively in Hadoop will give the users newcapabilities. Currently, to support Avro or ProtoBuf objects mutuallyincompatible third party solutions are required. It benefits Hadoop tosupport them with a common framework that will support all of them. Inparticular, having easy, out of the box support for Thrift, ProtoBufs,Avro, and our legacy serializations is a desired state.

As a distributed system, there are many instances where Hadoop needsto serialize data. Many of those applications need a lightweight,versioned serialization framework like ProtocolBuffers or Thrift andusing them is appropriate. Adding dependences on Thrift andProtocolBuffers to the previous dependence on Avro is acceptable.

[VOTE] Direction for Hadoop development

Reply via email to