+1 While protocol buffers and thrift have similar goals. Avro takes a different approach to schema evolution and reconciliation. I feel that Avros tighter layout of data and schema management is better suited for many of Hadoop and Pigs use cases for large data sets/tables on HDFS. Field ids start to matter when milions of objects have hundreds of fields each. There is, of course, the storage overhead. Schema management becomes hard especially if there are cases where field ids need to be assigned manually.
----- Original Message ---- From: Doug Cutting <[email protected]> To: [email protected] Sent: Thursday, April 2, 2009 3:05:08 PM Subject: [PROPOSAL] new subproject: Avro I propose we add a new Hadoop subproject for Avro, a serialization system. My ambition is for Avro to replace both Hadoop's RPC and to be used for most Hadoop data files, e.g., by Pig, Hive, etc. Initial committers would be Sharad Agarwal and me, both existing Hadoop committers. We are the sole authors of this software to date. The code is currently at: http://people.apache.org/~cutting/avro.git/ To learn more: git clone http://people.apache.org/~cutting/avro.git/ avro cat avro/README.txt Comments? Questions? Doug
