On Apr 2, 2009, at 5:11 PM, Abhishek Verma wrote:
I am a newbie here. Why not use something existing like protocol
buffers :
http://code.google.com/p/protobuf/ which is open source and works
amazingly
well.
There are two blockers for protocol buffers that make them suboptimal
for Hadoop. They are:
1. Protocol buffers are open source, but the community isn't open.
Google doesn't seem interested in getting patches from outside of
itself. If we needed something changed in protocol buffers, we'd end
up needing to fork the project to make any progress.
2. Protocol buffers (and thrift) encode the field names as id numbers.
That means that if you read them into dynamic language like Python
that it has to use the field numbers instead of the field names. In
Avro, the field names are saved and there are no field ids.
A final point is that since the schema isn't inlined in Avro, the
binary representation is much tighter than protocol buffers.
-- Owen