Clint Morgan wrote:
Actually, I think a more simple approach will get what we want here:
Give hbase a custom filesystem which writes to hdfs, then to s3, but
reads just from hdfs.

Thats an interesting idea Clint.  What would you call it?  (hdfs3?)

...
On Thu, May 1, 2008 at 5:14 PM, Clint Morgan <[EMAIL PROTECTED]> wrote:
 What do you need for HBASE-50?  Is it sufficient forcing the cluster to go
 > read-only flushing all in memory while the copy runs?

 Hopefully we can minimize the time we are read-only. We'd like the
 system to behave as close to normally as possible while snapshotting.
 Is the only danger of allowing writes that some new hstores may get
 written so that we don't get a consistent view? Could we solve that by
 only copying files who where created before the time when flushing
 completed? Or just taking a listing at this point and only copying
 from this listing?

HBase also removes files (For example, after compaction, it'll remove the old or after a split is done with its parent, the parent is removed). You could make a manifest that had all files at the time of snapshot but we'd have to do something like rename files that are for deletion adding a '.deleted' or use something like hard links -- does this exist in hdfs? -- so that deletes would be available to the copy task when it gets around to the copy.

 It would be nice for hbase to provide the orchestration when we need
 to restore from a snapshot. (Taking regions offline, copying over the
 appropriate parts of region and META, etc).

Yes. A tool that ran the 'fix' from backup and that verified and repaired the install.

 At this point I'm still not sure what types of failures to plan for.
 Any input on the sort of failures we should expect w.r.t data loss and
 corruption? Obviously name node failure, which we would handle with
 secondary name node. Should be able to recover from that just by bring
 hdfs back online. So I guess the main concern is that we get a
 corrupted hdfs file.
Can you tolerate empty write-ahead-logs? That is, lose of the in-memory content on regionserver crash because we don't yet have HADOOP-1700?


Otherwise, from what I've seen, fallures generally come of our having problems writing hdfs.

St.Ack

Reply via email to