HBase and similar HDFS clients could benefit from a (high) performant stable datacenter network protocol that is built into the namenode and datanodes. Then we could decouple from Hadoop versioning and release cycle. HDFS could decouple from core, etc.
Whatever stable network protocol is devised, if any, of course should perform as well if not better than the current one. A stable but lower performing option, unfortunately, would be excluded from consideration right away. HBase is a bit of a special case currently perhaps in that its access pattern is random read/write and it may be only a handful of clients like that. However if HDFS is positioned as a product in its own right, which I believe is the case since the split, there may be many other potential users of it -- for all of its benefits -- given a stable wire format that enables decoupled development. API compatibility +1 Data compatibility +1 Wire compatibility +1 Best regards, Andrew Purtell Committing Member, HBase Project: hbase.org ________________________________ From: Steve Loughran <ste...@apache.org> To: common-dev@hadoop.apache.org Sent: Monday, September 28, 2009 3:15:09 AM Subject: Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards Dhruba Borthakur wrote: > It is really nice to have wire-compatibility between clients and servers > running different versions of hadoop. The reason we would like this is > because we can allow the same client (Hive, etc) submit jobs to two > different clusters running different versions of hadoop. But I am not stuck > up on the name of the release that supports wire-compatibility, it can be > either 1.0 or something later than that. > API compatibility +1 > Data compatibility +1 > Job Q compatibility -1Wire compatibility +0 That's stability of the job submission network protocol you are looking for there. * We need a job submission API that is designed to work over long-haul links and versions * It does not have to be the same as anything used in-cluster * It does not actually need to run in the JobTracker. An independent service bridging the stable long-haul API to an unstable datacentre protocol does work, though authentication and user-rights are a troublespot Similarly, it would be good for a stable long-haul HDFS protocol, such as FTP or webdav. Again, no need to build into the namenode . see http://www.slideshare.net/steve_l/long-haul-hadoop and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop