Generally I can't agree that turning off the WAL is a good idea. You get speed, but at what cost? Also it's kind of like punting on making HLog fast, and reduces the incentives to do so. I think that with a single flush/sync per batch-put call will improve speed to the point where running with WAL turned off will be of minimal value.
-ryan On Tue, Apr 6, 2010 at 12:41 PM, Lars George <lars.geo...@gmail.com> wrote: > I agree with Jon here, parsing these files especially not having a > central logging is bad. I tried Splunk and that sort of worked as well > to quickly scan for exceptions. A problem were multiline stacktraces > (which they usually all are). They got mixed up when multiple servers > sent events at the same time. The Splunk data got all garbled then. > But something like that yeah. > > Maybe with the new Multiput style stuff the WAL is not such a big > overhead anymore? > > Lars > > On Tue, Apr 6, 2010 at 7:12 PM, Jonathan Gray <jg...@facebook.com> wrote: >> I like this idea. >> >> Putting major cluster events in some form into ZK. Could be used for jobs >> as Todd says. Can also be used as a cluster history report on web ui and >> such. Higher level historian. >> >> I'm a fan of anything that moves us away from requiring parsing hundreds or >> thousands of lines of logs to see what has happened. >> >> JG >> >>> -----Original Message----- >>> From: Todd Lipcon [mailto:t...@cloudera.com] >>> Sent: Tuesday, April 06, 2010 9:49 AM >>> To: hbase-dev@hadoop.apache.org >>> Subject: Re: Should HTable.put() return a Future? >>> >>> On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans >>> <jdcry...@apache.org>wrote: >>> >>> > Yes it is, you will be missing a RS ;) >>> > >>> > >>> How do you detect this, though? >>> >>> It might be useful to add a counter in ZK for region server crashes. If >>> the >>> master ever notices that a RS goes down, it increments it. Then we can >>> check >>> the before/after for a job and know when we might have lost some data. >>> >>> -Todd >>> >>> >>> > General rule when uploading without WAL is if there's a failure, the >>> > job is screwed and that's the tradeoff for speed. >>> > >>> > J-D >>> > >>> > On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <t...@cloudera.com> >>> wrote: >>> > > On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans >>> <jdcry...@apache.org >>> > >wrote: >>> > > >>> > >> The issue isn't with the write buffer here, it's the WAL. Your >>> edits >>> > >> are in the MemStore so as far as your clients can tell, the data >>> is >>> > >> all persisted. In this case you would need to know when all the >>> > >> memstores that contain your data are flushed... Best practice when >>> > >> turning off WAL is force flushing the tables after the job is >>> done, >>> > >> else you can't guarantee durability for the last edits. >>> > >> >>> > >> >>> > > You still can't guarantee durability for any of the edits, since a >>> > failure >>> > > in the middle of your job is undetectable :) >>> > > >>> > > -Todd >>> > > >>> > > >>> > >> J-D >>> > >> >>> > >> On Tue, Apr 6, 2010 at 4:02 AM, Lars George >>> <lars.geo...@gmail.com> >>> > wrote: >>> > >> > Hi, >>> > >> > >>> > >> > I have an issue where I do bulk import and since WAL is off and >>> a >>> > >> > default write buffer used (TableOutputFormat) I am running into >>> > >> > situations where the MR job completes successfully but not all >>> data is >>> > >> > actually restored. The issue seems to be a failure on the RS >>> side as >>> > >> > it cannot flush the write buffers because the MR overloads the >>> cluster >>> > >> > (usually the .META: hosting RS is the breaking point) or causes >>> the >>> > >> > underlying DFS to go slow and that repercussions all the way up >>> to the >>> > >> > RS's. >>> > >> > >>> > >> > My question is, would it make sense as with any other >>> asynchronous IO >>> > >> > to return a Future from the put() that will help checking the >>> status >>> > >> > of the actual server side async flush operation? Or am I >>> misguided >>> > >> > here? Please advise. >>> > >> > >>> > >> > Lars >>> > >> > >>> > >> >>> > > >>> > > >>> > > >>> > > -- >>> > > Todd Lipcon >>> > > Software Engineer, Cloudera >>> > > >>> > >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >> >