Hi folks, Is it necessary to run keep the clocks synchronized on all Hbase region servers/master? I would appreciate it a lot if somebody can please explain if the HBase architecture depends on this fact.
thanks, dhruba On Wed, Dec 23, 2009 at 9:57 AM, Mark Vigeant <[email protected]>wrote: > The clocks are all running in sync, though I am not using NTP shamefully. I > should. > > And no, I listed the errors backwards, that's not how they showed up in the > log, sorry, heh. I don't think they run backwards. > > -----Original Message----- > From: Andrew Purtell [mailto:[email protected]] > Sent: Wednesday, December 23, 2009 12:47 PM > To: [email protected] > Subject: Re: Smaller Region Size? > > How do you have clocks set up on your systems Mark? Are you using NTP to > keep > them sane? Am I correct that they are sometimes running backward? > > > - Andy > > > > ----- Original Message ---- > > From: Mark Vigeant <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Wed, December 23, 2009 9:09:04 AM > > Subject: RE: Smaller Region Size? > > > > > The biggest legitimate reason to run smaller region size is if your > > > data set is small (lets say 400mb) but highly accessed, so you want a > > > good spread of regions across your cluster. > > > > That's exactly it, my input dataset was 500MB total (~1,000,000 rows) and > it was > > getting stored as just one region on one regionserver. > > > > In response to St. Ack, I don't think my regions are performing too many > splits: > > the regionserver logs mainly consist of the occasional ZooKeeper > Connection > > error and these two repeatedly: > > > > 2009-12-22 15:21:50,415 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: > > Cache Stats: Sizes: Total=6.556961MB (6875472), Free=792.61804MB > (831120240), > > Max=799.175MB (837995712), Counts: Blocks=0, Access=25755, Hit=0, > Miss=25755, > > Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, > > Evicted/Run=NaN > > > > 2009-12-22 15:20:35,073 DEBUG org.apache.hadoop.hbase.regionserver.Store: > > Skipping major compaction of Message because one (major) compacted file > only and > > elapsedTime 339624149ms is < ttl=9223372036854775807 > > > > You're suggesting the performance would be improved if the dataset was > larger? > > What are other parameters that can be fine-tuned to optimize based off > data > > size? > > > > Thanks > > -Mark > > -----Original Message----- > > From: Ryan Rawson [mailto:[email protected]] > > Sent: Tuesday, December 22, 2009 11:28 PM > > To: [email protected] > > Subject: Re: Smaller Region Size? > > > > The biggest legitimate reason to run smaller region size is if your > > data set is small (lets say 400mb) but highly accessed, so you want a > > good spread of regions across your cluster. > > > > Another is to run a larger region if you are having a huge table and > > you want to keep absolute region count low. I am not 100% sold on this > > yet. > > > > I have a patch that can keep performance high during a highly split > > table, by using parallel puts. This has been proven to keep aggregate > > performance really high, and I hope it will make 0.20.3. > > > > On Tue, Dec 22, 2009 at 2:31 PM, stack wrote: > > > On Tue, Dec 22, 2009 at 8:57 AM, Mark Vigeant > > > wrote: > > > > > >> J-D, > > >> > > >> I noticed that performance for uploading data into tables got a lot > better > > >> as I lowered the max file size -- but up until a certain point, where > the > > >> performance began slowing down again. > > >> > > >> > > > Tell us more. What kinda size changes did you make? How many regions > were > > > created? Is the slow down because table is splitting all the time? If > you > > > study regionserver logs, can you make out what the regionservers are > > > spending their times doing? > > > > > > > > > > > >> Is there a rule of thumb/formula/notion to rely on when setting this > > >> parameter for optimal performance? Thanks! > > >> > > >> > > > We have most experience running defaults. Generally folks go up from > the > > > default size because they want to host more data in about same number > or > > > regions. Going down from the default I've not seen much of. > > > > > > St.Ack > > > > > > > This email message and any attachments are for the sole use of the > intended > > recipients and may contain proprietary and/or confidential information > which may > > be privileged or otherwise protected from disclosure. Any unauthorized > review, > > use, disclosure or distribution is prohibited. If you are not an intended > > recipient, please contact the sender by reply email and destroy the > original > > message and any copies of the message as well as any attachments to the > original > > message. > > > > > > > This email message and any attachments are for the sole use of the intended > recipients and may contain proprietary and/or confidential information which > may be privileged or otherwise protected from disclosure. Any unauthorized > review, use, disclosure or distribution is prohibited. If you are not an > intended recipient, please contact the sender by reply email and destroy the > original message and any copies of the message as well as any attachments to > the original message. > -- Connect to me at http://www.facebook.com/dhruba
