Re: [PROPOSAL] new subproject: Avro

Doug Cutting Fri, 03 Apr 2009 15:45:32 -0700

Jim Kellerman (POWERSET) wrote:

It is also my understanding (based on the email thread) that Avro only
supports Java and python. That is a step backwards from Thrift.


We intend to add support for more languages.  Avro is not complete.

It appears that Avro uses introspection heavily, which is expensive in
applications that require a high message rate.

It only uses introspection if you wish to use your existing Java classesto represent Avro data. There are three representations in Java:generic (uses Map<String,Object> for records, List<Object> for arrays),specific (generates a java class for each Avro record, like Thrift) andreflect (uses reflection to access existing classes). So introspectionis optional. And, while introspection is indeed slow for processingfile-based data, it would probably not a bottleneck for most RPCprotocols and might be a useful tool to migrate existing code to Avro.

So I guess my question is why Avro?

The compelling case is dynamic data types. Pig, Hive, Python, Perl etc.scripts should not have to generate a Thrift IDL file each time theywish to write a data file with a new schema, nor should they need to runthe Thrift compiler for each data file they wish to read. Forproduction applications, code-generation is not an imposition and mayoffer increased opportunities for optimization and error checking, butfor exploration and experimentation, a very common use case for Hadoop,one would like to be able to browse datasets and build mapreduceprograms more interactively.


Doug

Re: [PROPOSAL] new subproject: Avro

Reply via email to