Jim Kellerman (POWERSET) wrote:
It is also my understanding (based on the email thread) that Avro only
supports Java and python. That is a step backwards from Thrift.

We intend to add support for more languages.  Avro is not complete.

It appears that Avro uses introspection heavily, which is expensive in
applications that require a high message rate.

It only uses introspection if you wish to use your existing Java classes to represent Avro data. There are three representations in Java: generic (uses Map<String,Object> for records, List<Object> for arrays), specific (generates a java class for each Avro record, like Thrift) and reflect (uses reflection to access existing classes). So introspection is optional. And, while introspection is indeed slow for processing file-based data, it would probably not a bottleneck for most RPC protocols and might be a useful tool to migrate existing code to Avro.

So I guess my question is why Avro?

The compelling case is dynamic data types. Pig, Hive, Python, Perl etc. scripts should not have to generate a Thrift IDL file each time they wish to write a data file with a new schema, nor should they need to run the Thrift compiler for each data file they wish to read. For production applications, code-generation is not an imposition and may offer increased opportunities for optimization and error checking, but for exploration and experimentation, a very common use case for Hadoop, one would like to be able to browse datasets and build mapreduce programs more interactively.

Doug

Reply via email to