On Dec 7, 2010, at 2:37 PM, Roy T. Fielding wrote:

The proposal is to change the extension mechanism incompatibly with unclear benefits,

Good, these are technical reasons. The benefits can be cleared by docs. By incompatible, I assume you mean forward-compatibility of old versions
of Hadoop reading newer files.  Can we fix that by having the new
implementation use the old file format by default until it is configured
to use one of the new interfaces for writing?


There are two goals here. The first is to extend the serialization plugin interface. The current patch does things completely compatibly including a shim that will use the previous plugins to satisfy the new API. The benefits are also clear. Avro serialization is possible when it wasn't previously. It also provides a wide range of opportunities that weren't previously possible.

The file format was changed as a demonstration that the serialization interface was useful and complete. The file change is also backwards compatible and will automatically read old versions of the file. Old versions of the code will complain with an error message if they are given a new version. This is exactly the pattern we have used in the past.

So, no there are no technical issues with the patch as it stands.

You keep referring to the kernel as if it were a product.  I don't see
a kernel product in the list of things released by Apache Hadoop.

The kernel is a very loosely defined concept. Utilities that are currently used by the framework are "kernel" others are just used by the users. Some classes are clearly kernel and some are clearly library, but there are some such as BooleanWritable that aren't obvious. It would take a fair amount of work and likely some duplication to segregate out the library code. I also worry that creating such a project would make Hadoop less useful out of the box and decrease the value of the Apache release of Hadoop.

But back to the original point. Doug's (and Tom's) veto was based on:
1. Modification to SequenceFile.
2. It introduces a dependence on Protocol Buffers.

There was strong consensus that SequenceFile was required and should be updated as the framework evolves. The second is not a technical reason. I believe that the entire veto should be considered invalid.

-- Owen

Reply via email to