On Thu, Dec 2, 2010 at 3:21 PM, Jean-Daniel Cryans <[email protected]>wrote:
> Hey Andrew, > > They were still all dead? From session expiration or OOME? Or HDFS issues? > > I've found the same in my load testing - it's a compaction pause for me. Avoiding heap fragmentation seems to be basically impossible. -Todd > J-D > > On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell <[email protected]> > wrote: > > J-D, > > > > Your hypothesis is interesting. > > > > I took the same step -- change 100 -> 10 -- to reduce the probability > that regionservers would OOME under high write load as generated by an end > simulation I have been developing, to model an application we plan to > deploy. (Stack, this is the next generation of the monster that led us to > find the problem with ByteArrayOutputStream buffer management in the 0.19 > time frame. It's baaaaack, bigger than before.) > > > > Reducing handler.count did move the needle, but sooner or later they are > all dead, at 4G heap, or 8G heap... and the usual GC tuning tricks are not > helping. > > > > When I get back from this latest tour of Asia next week I need to dig in > with jhat and jprofiler. > > > > Best regards, > > > > - Andy > > > > > > --- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA) <[email protected]> wrote: > > > >> From: Jean-Daniel Cryans (JIRA) <[email protected]> > >> Subject: [jira] Created: (HBASE-3303) Lower > hbase.regionserver.handler.count from 25 back to 10 > >> To: [email protected] > >> Date: Thursday, December 2, 2010, 2:02 PM > >> Lower > >> hbase.regionserver.handler.count from 25 back to 10 > >> --------------------------------------------------------- > >> > >> > >> Key: HBASE-3303 > >> > >> URL: https://issues.apache.org/jira/browse/HBASE-3303 > >> > >> Project: HBase > >> Issue Type: Improvement > >> Reporter: > >> Jean-Daniel Cryans > >> Assignee: > >> Jean-Daniel Cryans > >> Fix > >> For: 0.90.0 > >> > >> > >> With HBASE-2506 in mind, I tested a low-memory environment > >> (2GB of heap) with a lot of concurrent writers using the > >> default write buffer to verify if a lower number of handlers > >> actually helps reducing the occurrence full GCs. Very > >> unscientifically, at this moment I think it's safe to say > >> that yes, it helps. > >> > >> With the defaults, I saw a region server struggling more > >> and more because the random inserters at some point started > >> filling up all the handlers and were all BLOCKED trying to > >> sync the WAL. It's safe to say that each of those clients > >> carried a payload that the GC cannot get rid of and it's one > >> that we don't account for (as opposed to MemStore and the > >> block cache). > >> > >> With a much lower setting of 5, I didn't see the > >> situation. > >> > >> It kind of confirms my hypothesis but I need to do more > >> proper testing. In the mean time, in order to lower the > >> onslaught of users that write to the ML complaining about > >> either GCs or OOMEs, I think we should set the handlers back > >> to what it was originally (10) for 0.90.0 and add some > >> documentation about configuring > >> hbase.regionserver.handler.count > >> > >> I'd like to hear others' thoughts. > >> > >> -- > >> This message is automatically generated by JIRA. > >> - > >> You can reply to this email to add a comment to the issue > >> online. > >> > >> > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
