+1

While protocol buffers and thrift have similar goals. Avro takes a different 
approach to schema evolution and reconciliation. I feel that Avros tighter 
layout of data and schema management is better suited for many of Hadoop and 
Pigs use cases for large data sets/tables on HDFS. Field ids start to matter 
when milions of objects have hundreds of fields each. There is, of course, the 
storage overhead. Schema management becomes hard especially if there are cases 
where field ids need to be assigned manually.


----- Original Message ----
From: Doug Cutting <[email protected]>
To: [email protected]
Sent: Thursday, April 2, 2009 3:05:08 PM
Subject: [PROPOSAL] new subproject: Avro

I propose we add a new Hadoop subproject for Avro, a serialization system.  My 
ambition is for Avro to replace both Hadoop's RPC and to be used for most 
Hadoop data files, e.g., by Pig, Hive, etc.

Initial committers would be Sharad Agarwal and me, both existing Hadoop 
committers.  We are the sole authors of this software to date.

The code is currently at:

http://people.apache.org/~cutting/avro.git/

To learn more:

git clone http://people.apache.org/~cutting/avro.git/ avro
cat avro/README.txt

Comments?  Questions?

Doug



      

Reply via email to