I like this idea. Putting major cluster events in some form into ZK. Could be used for jobs as Todd says. Can also be used as a cluster history report on web ui and such. Higher level historian.
I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines of logs to see what has happened. JG > -----Original Message----- > From: Todd Lipcon [mailto:t...@cloudera.com] > Sent: Tuesday, April 06, 2010 9:49 AM > To: hbase-dev@hadoop.apache.org > Subject: Re: Should HTable.put() return a Future? > > On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans > <jdcry...@apache.org>wrote: > > > Yes it is, you will be missing a RS ;) > > > > > How do you detect this, though? > > It might be useful to add a counter in ZK for region server crashes. If > the > master ever notices that a RS goes down, it increments it. Then we can > check > the before/after for a job and know when we might have lost some data. > > -Todd > > > > General rule when uploading without WAL is if there's a failure, the > > job is screwed and that's the tradeoff for speed. > > > > J-D > > > > On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <t...@cloudera.com> > wrote: > > > On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans > <jdcry...@apache.org > > >wrote: > > > > > >> The issue isn't with the write buffer here, it's the WAL. Your > edits > > >> are in the MemStore so as far as your clients can tell, the data > is > > >> all persisted. In this case you would need to know when all the > > >> memstores that contain your data are flushed... Best practice when > > >> turning off WAL is force flushing the tables after the job is > done, > > >> else you can't guarantee durability for the last edits. > > >> > > >> > > > You still can't guarantee durability for any of the edits, since a > > failure > > > in the middle of your job is undetectable :) > > > > > > -Todd > > > > > > > > >> J-D > > >> > > >> On Tue, Apr 6, 2010 at 4:02 AM, Lars George > <lars.geo...@gmail.com> > > wrote: > > >> > Hi, > > >> > > > >> > I have an issue where I do bulk import and since WAL is off and > a > > >> > default write buffer used (TableOutputFormat) I am running into > > >> > situations where the MR job completes successfully but not all > data is > > >> > actually restored. The issue seems to be a failure on the RS > side as > > >> > it cannot flush the write buffers because the MR overloads the > cluster > > >> > (usually the .META: hosting RS is the breaking point) or causes > the > > >> > underlying DFS to go slow and that repercussions all the way up > to the > > >> > RS's. > > >> > > > >> > My question is, would it make sense as with any other > asynchronous IO > > >> > to return a Future from the put() that will help checking the > status > > >> > of the actual server side async flush operation? Or am I > misguided > > >> > here? Please advise. > > >> > > > >> > Lars > > >> > > > >> > > > > > > > > > > > > -- > > > Todd Lipcon > > > Software Engineer, Cloudera > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera