Re: [PROPOSAL] new subproject: Avro

Chad Walters Wed, 08 Apr 2009 00:04:06 -0700

Doug,

After our off-list chat, and given that you have indicated that the design is 
still in flux and that you are open to discussing changes that would permit 
interoperability, I am not as concerned as I was.

My urgency came from concern that once the design was put in place as part of 
an Apache subproject, rather than open sourced in some other less prominent 
forum, it would increase the barrier to interoperability; in particular, I was 
concerned that people would assume the design of the data format was 
fully-baked and start persisting large amounts of data in some early version of 
the format, potentially prematurely ossifying the design in a state unsuited 
for compatibility with Thrift. Given your clarifications around this, my fears 
clearly were not well-founded.

Please accept my apology if I came across as obstructionist. I was honestly 
advocating on behalf of what I believe is in the best interest our shared user 
base.

Clearly we have some disagreements about the value of some of Thrift's design 
choices and what those mean for various use cases. I think we also have some 
differences of opinion about the relative difficulty of implementation versus 
the value of interoperability. Hopefully, the next few months will afford an 
opportunity to examine the sources of those disagreements and see if they can 
be resolved.

Sincerely,

Chad

----- Original Message ----
From: Doug Cutting <[email protected]>
To: [email protected]
Sent: Tuesday, April 7, 2009 8:33:32 PM
Subject: Re: [PROPOSAL] new subproject: Avro

To be clear, since a few folks have missed this point: Avro is not complete.  
At some point in the future, before people start using it as a format for 
persistent data, we'll need to stop altering its specification, or at least do 
so much more cautiously.  But before then, my immediate goal to move 
development from private to open so that we have a chance to incorporate 
feedback before we lock down the specification.

For example, several folks have raised the issue of compatibility with Thrift.  
We certainly want to avoid gratuitous incompatibilities.  There are also 
features clearly missing from Avro that we expect to add before we make a 
release, like default values, a more efficient RPC handshake, etc.  And some 
features that we might consider removing, if they're not broadly useful and 
inhibit interoperability, like single-float, which isn't in Thrift, Python, 
etc.  And I expect there will be more such issues raised in the coming weeks 
and months.

But before we can discuss and resolve such issues we need a forum in which to 
do so.  That's all I am after at this point: mailing lists, a bug database, a 
public source code repository, etc., so that we can start accepting patches, 
adding committers, etc.

Three days have now passed since I initially proposed this, the nominal time 
for an Apache vote.  Is there anyone who strongly opposes taking the 
development of Avro public as a Hadoop subproject?  Only PMC votes are binding, 
but I would vastly prefer that the broader community also supports this step in 
the process.

Thanks,

Doug

Doug Cutting wrote:
> I propose we add a new Hadoop subproject for Avro, a serialization system.  
> My ambition is for Avro to replace both Hadoop's RPC and to be used for most 
> Hadoop data files, e.g., by Pig, Hive, etc.
> 
> Initial committers would be Sharad Agarwal and me, both existing Hadoop 
> committers.  We are the sole authors of this software to date.
> 
> The code is currently at:
> 
> http://people.apache.org/~cutting/avro.git/
> 
> To learn more:
> 
> git clone http://people.apache.org/~cutting/avro.git/ avro
> cat avro/README.txt
> 
> Comments?  Questions?
> 
> Doug

Re: [PROPOSAL] new subproject: Avro

Reply via email to