The first step for us is failover. How do we get hbase up and running quickly if the master goes down? Also how do we balance load?
Am I missing something or does using hbase to drive a live site violate many of the assumptions for hadoop/mapred which was designed to be a great parallelization framework for background tasks? I assume Yahoo and Google are using the dfs/gfs in both modes and therefore have solved the failover and load balancing issues. Maybe the committers could comment on where they plan to go with this? On 12/20/07 10:29 AM, "Billy" <[EMAIL PROTECTED]> wrote: > I agree I do not thank hadoop/hbase can become production level with out > means to backup the data with snap shots or some other way to do point in > time backups that can restore a cluster to a state when the backup was done. > > I thank an acceptable level of backup is storing the data with in hadoop as > one or more files but there should be some kind of safe guard to keep the > backups from getting deleted/corrupt. Its been a while sense I read googles > gfs paper but I thank that's how they do it is just take a snapshot and > store it with in the cluster on there gfs. > > Billy > > > "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote > in message > news:[EMAIL PROTECTED] > agreed - i think for anyone who is thinking of using hadoop as a place from > where data is served - has to be distrubed by lack of data protection. > > replication in hadoop provides protection against hardware failures. not > software failures. backups (and depending on how they are implemented - > snapshots) protect against errant software. we have seen evidence of the > namenode going haywire and causing block deletions/file corruptions at least > once. we have seen more reports of the same nature on this list. i don't > think hadoop (and hbase) can reach their full potential without a safeguard > against software corruptions. > > this question came up a couple of days back as well. one option is switching > over to solaris+zfs as a way of taking data snapshots. the other option is > having two hdfs instances (ideally running different versions) and > replicating data amongst them. both have clear downsides. > > (i don't think the traditional notion of backing up to tape (or even virtual > tape - which is really what our filers are becoming) is worth discussing. > for large data sets - the restore time would be so bad as to render these > useless as a recovery path). > > ________________________________ > > From: Pat Ferrel [mailto:[EMAIL PROTECTED] > Sent: Thu 12/20/2007 7:25 AM > To: hadoop-user@lucene.apache.org > Subject: Re: Some Doubts of hadoop functionality > > > >> > 2. If hadoop is configured in multinode cluster(with One machine as >> > namenode >>>> >> > and jobtracker and other machine as slave. Namenode acts as a slave >>>> >> > node >>>> >> > also) . How to handle the namenode failovers?. >> > >> > There are backup mechanisms that you can use to allow you rebuild the name >> > node. There is no official solution for the high availability problem. >> > Most hadoop systems work on batch problems where an hour or two of >> > downtime >> > every few years is not a problem. > > Actually we were thinking of the product of many mapreduce tasks as needing > high availability. In other words you can handle down time in creating the > database but not so much in serving it up. If hbase is the source from > which we build pages then downtime is more of a problem. If anyone is > thinking about an unofficial solution we¹d be interested. > > > On 12/20/07 12:05 AM, "Ted Dunning" <[EMAIL PROTECTED]> > wrote: > >> > >> > >> > >> > On 12/19/07 11:17 PM, "M.Shiva" >> > <[EMAIL PROTECTED]> wrote: >> > >> > >>>> >> > 1.Did Separate machines/nodes needed for Namenode ,Jobtracker, >>>> >> > Slavenodes >> > >> > No. I run my namenode and job-tracker on one of my storage/worker nodes. >> > You can run everything on a single node and still get some interesting >> > results because of the discipline imposed by map-reduce programming. >> > >> > BUT... Running this stuff on separate nodes is the POINT of hadoop. >> > >>>> >> > 2. If hadoop is configured in multinode cluster(with One machine as >>> >> namenode >>>> >> > and jobtracker and other machine as slave. Namenode acts as a slave >>>> >> > node >>>> >> > also) . How to handle the namenode failovers?. >> > >> > There are backup mechanisms that you can use to allow you rebuild the name >> > node. There is no official solution for the high availability problem. >> > Most hadoop systems work on batch problems where an hour or two of >> > downtime >> > every few years is not a problem. >> > >>>> >> > 3.This question is inter-related with second question. Incase of >>>> >> > namenode >>>> >> > failovers , Can slave nodes can be configured and can act as a >>>> >> > namenode >> > for >>>> >> > itself and can take the control of the other slave nodes >> > >> > No. You have to actually take specific action to bring up a new name >> > node. >> > This isn't hard, though. >> > >>>> >> > 4.If at all I rebuild the namenode again after a failover. How would >>>> >> > old >>>> >> > multinode cluster set up is reproduced again. How to rebuild the same >>>> >> > multinode cluster set up similar to the previous one >> > >> > I clearly don't understand this question because the answer seems obvious. >> > If build a new namenode and job tracker that have the same configuration >> > as >> > the old one, then you have a replica of the old cluster. What is the >> > question? >> > >>>> >> > 5.Can we take backup and restore the files written to hadoop >> > >> > Obviously, yes. >> > >> > But, again, the point of hadoop's file system is that it makes this >> > largely >> > unnecessary because of file replication. >> > >>>> >> > 6.There is no possibility of rewriting the same file in the hadoop >>>> >> > (HDFS) >> > >> > This isn't a question. Should it have been? >> > > > > > > > >