Re: [PROPOSAL] new subproject: Avro

Owen O'Malley Thu, 02 Apr 2009 21:29:36 -0700


On Apr 2, 2009, at 5:11 PM, Abhishek Verma wrote:

I am a newbie here. Why not use something existing like protocolbuffers :http://code.google.com/p/protobuf/ which is open source and worksamazingly
well.

There are two blockers for protocol buffers that make them suboptimalfor Hadoop. They are:

1. Protocol buffers are open source, but the community isn't open.Google doesn't seem interested in getting patches from outside ofitself. If we needed something changed in protocol buffers, we'd endup needing to fork the project to make any progress.

2. Protocol buffers (and thrift) encode the field names as id numbers.That means that if you read them into dynamic language like Pythonthat it has to use the field numbers instead of the field names. InAvro, the field names are saved and there are no field ids.

A final point is that since the schema isn't inlined in Avro, thebinary representation is much tighter than protocol buffers.


-- Owen

Re: [PROPOSAL] new subproject: Avro

Reply via email to