HBase and similar HDFS clients could benefit from a (high) performant 
stable datacenter network protocol that is built into the namenode and
datanodes. Then we could decouple from Hadoop versioning and release
cycle. HDFS could decouple from core, etc. 

Whatever stable network protocol is devised, if any, of course should
perform as well if not better than the current one. A stable but lower
performing option, unfortunately, would be excluded from consideration
right away. 

HBase is a bit of a special case currently perhaps in that its access
pattern is random read/write and it may be only a handful of clients
like that. However if HDFS is positioned as a product in its own right,
which I believe is the case since the split, there may be many other
potential users of it -- for all of its benefits -- given a stable 
wire format that enables decoupled development. 

API compatibility  +1
Data compatibility +1
Wire compatibility +1

Best regards,

Andrew Purtell
Committing Member, HBase Project: hbase.org





________________________________
From: Steve Loughran <ste...@apache.org>
To: common-dev@hadoop.apache.org
Sent: Monday, September 28, 2009 3:15:09 AM
Subject: Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Dhruba Borthakur wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0  or something later than that.
> API compatibility  +1
> Data compatibility +1
> Job Q compatibility -1Wire compatibility +0


That's stability of the job submission network protocol you are looking for 
there.
* We need a job submission API that is designed to work over long-haul links 
and versions
* It does not have to be the same as anything used in-cluster
* It does not actually need to run in the JobTracker. An independent service 
bridging the stable long-haul API to an unstable datacentre protocol does work, 
though authentication and user-rights are a troublespot

Similarly, it would be good for a stable long-haul HDFS protocol, such as FTP 
or webdav. Again, no need to build into the namenode .

see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop



      

Reply via email to