Re: [PROPOSAL] new subproject: Avro

Doug Cutting Mon, 13 Apr 2009 10:54:53 -0700

Ankur Goel wrote:

How fast do we expect the new serialization system to be when it
replaces existing serialization mechanism in Hadoop RPC?

I hope that Avro will make its first release this summer. Sometime soonafter, I hope that we can start moving Hadoop Core's trunk RPC ontoAvro. We may start developing an experimental version of Hadoop Corethat uses Avro in a branch before Avro is released. This is allspeculative, of course. Any detailed discussion of Hadoop Core's futurebelongs on the core-dev@ and of Avro's future on avro-...@.

A clear description of the existing bottlenecks and the performance
goals for this system would help developers interested in
contributing.

Adding Avro to Hadoop Core is not primarily about performance but ratherabout compatibility and security.

Hadoop's existing RPC is not a performance bottleneck, nor is HDFS'sdata transfer protocol. However, currently, Hadoop requires thatclients and servers must run the exact same version of code, since theexisting RPC is not tolerant of protocol changes. We'd like to changethat, so that one can run older clients against newer servers and viceversa. Longer term, we'd also like to permit clients in languages otherthan Java. We intend Avro to provide a change-tolerant, cross-platformRPC solution.

We'd also like Hadoop to become more secure. Currently Hadoop usesthree different communications mechanisms: RPC, HTTP (for shuffle) and araw socket-based protocol for HDFS data transfers. It would be best notto have to re-implement security features for each of these. So we hopethat we can make Avro perform well enough to replace not only Hadoop'sRPC, but also HTTP in the shuffle and the HDFS data transfer protocol.

If you're interested in discussing Avro further, I encourage you to jointhe Avro mailing lists.


http://hadoop.apache.org/avro/mailing_lists.html

Doug

Re: [PROPOSAL] new subproject: Avro

Reply via email to