Jim Kellerman (POWERSET) wrote:
It is also my understanding (based on the email thread) that Avro only supports Java and python. That is a step backwards from Thrift.
We intend to add support for more languages. Avro is not complete.
It appears that Avro uses introspection heavily, which is expensive in applications that require a high message rate.
It only uses introspection if you wish to use your existing Java classes to represent Avro data. There are three representations in Java: generic (uses Map<String,Object> for records, List<Object> for arrays), specific (generates a java class for each Avro record, like Thrift) and reflect (uses reflection to access existing classes). So introspection is optional. And, while introspection is indeed slow for processing file-based data, it would probably not a bottleneck for most RPC protocols and might be a useful tool to migrate existing code to Avro.
So I guess my question is why Avro?
The compelling case is dynamic data types. Pig, Hive, Python, Perl etc. scripts should not have to generate a Thrift IDL file each time they wish to write a data file with a new schema, nor should they need to run the Thrift compiler for each data file they wish to read. For production applications, code-generation is not an imposition and may offer increased opportunities for optimization and error checking, but for exploration and experimentation, a very common use case for Hadoop, one would like to be able to browse datasets and build mapreduce programs more interactively.
Doug
