On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:
> The issue isn't with the write buffer here, it's the WAL. Your edits > are in the MemStore so as far as your clients can tell, the data is > all persisted. In this case you would need to know when all the > memstores that contain your data are flushed... Best practice when > turning off WAL is force flushing the tables after the job is done, > else you can't guarantee durability for the last edits. > > You still can't guarantee durability for any of the edits, since a failure in the middle of your job is undetectable :) -Todd > J-D > > On Tue, Apr 6, 2010 at 4:02 AM, Lars George <lars.geo...@gmail.com> wrote: > > Hi, > > > > I have an issue where I do bulk import and since WAL is off and a > > default write buffer used (TableOutputFormat) I am running into > > situations where the MR job completes successfully but not all data is > > actually restored. The issue seems to be a failure on the RS side as > > it cannot flush the write buffers because the MR overloads the cluster > > (usually the .META: hosting RS is the breaking point) or causes the > > underlying DFS to go slow and that repercussions all the way up to the > > RS's. > > > > My question is, would it make sense as with any other asynchronous IO > > to return a Future from the put() that will help checking the status > > of the actual server side async flush operation? Or am I misguided > > here? Please advise. > > > > Lars > > > -- Todd Lipcon Software Engineer, Cloudera