On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:
> Yes it is, you will be missing a RS ;) > > How do you detect this, though? It might be useful to add a counter in ZK for region server crashes. If the master ever notices that a RS goes down, it increments it. Then we can check the before/after for a job and know when we might have lost some data. -Todd > General rule when uploading without WAL is if there's a failure, the > job is screwed and that's the tradeoff for speed. > > J-D > > On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <t...@cloudera.com> wrote: > > On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans <jdcry...@apache.org > >wrote: > > > >> The issue isn't with the write buffer here, it's the WAL. Your edits > >> are in the MemStore so as far as your clients can tell, the data is > >> all persisted. In this case you would need to know when all the > >> memstores that contain your data are flushed... Best practice when > >> turning off WAL is force flushing the tables after the job is done, > >> else you can't guarantee durability for the last edits. > >> > >> > > You still can't guarantee durability for any of the edits, since a > failure > > in the middle of your job is undetectable :) > > > > -Todd > > > > > >> J-D > >> > >> On Tue, Apr 6, 2010 at 4:02 AM, Lars George <lars.geo...@gmail.com> > wrote: > >> > Hi, > >> > > >> > I have an issue where I do bulk import and since WAL is off and a > >> > default write buffer used (TableOutputFormat) I am running into > >> > situations where the MR job completes successfully but not all data is > >> > actually restored. The issue seems to be a failure on the RS side as > >> > it cannot flush the write buffers because the MR overloads the cluster > >> > (usually the .META: hosting RS is the breaking point) or causes the > >> > underlying DFS to go slow and that repercussions all the way up to the > >> > RS's. > >> > > >> > My question is, would it make sense as with any other asynchronous IO > >> > to return a Future from the put() that will help checking the status > >> > of the actual server side async flush operation? Or am I misguided > >> > here? Please advise. > >> > > >> > Lars > >> > > >> > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > -- Todd Lipcon Software Engineer, Cloudera