The issue isn't with the write buffer here, it's the WAL. Your edits are in the MemStore so as far as your clients can tell, the data is all persisted. In this case you would need to know when all the memstores that contain your data are flushed... Best practice when turning off WAL is force flushing the tables after the job is done, else you can't guarantee durability for the last edits.
J-D On Tue, Apr 6, 2010 at 4:02 AM, Lars George <lars.geo...@gmail.com> wrote: > Hi, > > I have an issue where I do bulk import and since WAL is off and a > default write buffer used (TableOutputFormat) I am running into > situations where the MR job completes successfully but not all data is > actually restored. The issue seems to be a failure on the RS side as > it cannot flush the write buffers because the MR overloads the cluster > (usually the .META: hosting RS is the breaking point) or causes the > underlying DFS to go slow and that repercussions all the way up to the > RS's. > > My question is, would it make sense as with any other asynchronous IO > to return a Future from the put() that will help checking the status > of the actual server side async flush operation? Or am I misguided > here? Please advise. > > Lars >