Cool stuff, thanks! David
On Thu, Feb 26, 2009 at 11:13 AM, Jim Kellerman (POWERSET) < [email protected]> wrote: > > -----Original Message----- > > From: [email protected] [mailto:[email protected]] > > On Behalf Of David Van Couvering > > Sent: Thursday, February 26, 2009 10:18 AM > > To: [email protected] > > Subject: HBase and failure notification > > > > Hey, all. I'm doing a bit of a survey of distributed key/value stores > > out > > there. HBase looks pretty interesting, nice to see an open source > > version > > of BigTable out there. > > > > HBase is obviously clustered, but what I can't figure out is how it does > > cluster management. It looks like you have to configure it to tell it > > all > > the machines that have region servers, and that implies to me that *you* > > have to start and manage the region servers - HBase doesn't do any of > > that > > for you. > > There are start and stop scripts that will start up the master and region > servers. > > > So I think that means that it doesn't have any node monitoring > > support - you have to have your own monitoring system that detects > > failed nodes and notifies you and/or restarts them for you. > > HBase has a web UI that you can use to monitor the state of the cluster. > The master does detect when a region server becomes unreachable. > > But if you mean machine failure, HBase does not have built in monitoring, > but you can use Ganglia to monitor the hardware status. HBase can also > feed metrics to Ganglia. > > > > > Also, the architecture document says "if [the master server] detects a > > HRegionServer is no longer reachable, it will split the HRegionServer's > > write-ahead log so that there is now one write-ahead log for each region > > that the HRegionServer was serving. After it has accomplished this, it > > will > > reassign the regions that were being served by the unreachable > > HRegionServer" > > > > This seems to imply that even though the HRegionServer is unreachable, > > somehow it's write-ahead log and the regions it was serving are. > > Perhaps I > > don't fully understand HFS, but is this a guarantee when the node > > hosting > > the HRegionServer is down? What happens if you can't get to the write- > > ahead > > log and/or some of the regions the region server was serving? > > HDFS replicates data to multiple machines (3 by default), so unless you > have a catastrophic outage, it is very unlikely that the data will be > completely unreachable. > > > Thanks, > > > > David > > > > -- > > David W. Van Couvering > > > > I am looking for a senior position working on server-side Java systems. > > Feel free to contact me if you know of any opportunities. > > > > http://www.linkedin.com/in/davidvc > > http://davidvancouvering.blogspot.com > > http://twitter.com/dcouvering > > > > > > -- > > David W. Van Couvering > > > > I am looking for a senior position working on server-side Java systems. > > Feel free to contact me if you know of any opportunities. > > > > http://www.linkedin.com/in/davidvc > > http://davidvancouvering.blogspot.com > > http://twitter.com/dcouvering > -- David W. Van Couvering I am looking for a senior position working on server-side Java systems. Feel free to contact me if you know of any opportunities. http://www.linkedin.com/in/davidvc http://davidvancouvering.blogspot.com http://twitter.com/dcouvering
