Hey Andrew, They were still all dead? From session expiration or OOME? Or HDFS issues?
J-D On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell <[email protected]> wrote: > J-D, > > Your hypothesis is interesting. > > I took the same step -- change 100 -> 10 -- to reduce the probability that > regionservers would OOME under high write load as generated by an end > simulation I have been developing, to model an application we plan to deploy. > (Stack, this is the next generation of the monster that led us to find the > problem with ByteArrayOutputStream buffer management in the 0.19 time frame. > It's baaaaack, bigger than before.) > > Reducing handler.count did move the needle, but sooner or later they are all > dead, at 4G heap, or 8G heap... and the usual GC tuning tricks are not > helping. > > When I get back from this latest tour of Asia next week I need to dig in with > jhat and jprofiler. > > Best regards, > > - Andy > > > --- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA) <[email protected]> wrote: > >> From: Jean-Daniel Cryans (JIRA) <[email protected]> >> Subject: [jira] Created: (HBASE-3303) Lower hbase.regionserver.handler.count >> from 25 back to 10 >> To: [email protected] >> Date: Thursday, December 2, 2010, 2:02 PM >> Lower >> hbase.regionserver.handler.count from 25 back to 10 >> --------------------------------------------------------- >> >> >> Key: HBASE-3303 >> >> URL: https://issues.apache.org/jira/browse/HBASE-3303 >> >> Project: HBase >> Issue Type: Improvement >> Reporter: >> Jean-Daniel Cryans >> Assignee: >> Jean-Daniel Cryans >> Fix >> For: 0.90.0 >> >> >> With HBASE-2506 in mind, I tested a low-memory environment >> (2GB of heap) with a lot of concurrent writers using the >> default write buffer to verify if a lower number of handlers >> actually helps reducing the occurrence full GCs. Very >> unscientifically, at this moment I think it's safe to say >> that yes, it helps. >> >> With the defaults, I saw a region server struggling more >> and more because the random inserters at some point started >> filling up all the handlers and were all BLOCKED trying to >> sync the WAL. It's safe to say that each of those clients >> carried a payload that the GC cannot get rid of and it's one >> that we don't account for (as opposed to MemStore and the >> block cache). >> >> With a much lower setting of 5, I didn't see the >> situation. >> >> It kind of confirms my hypothesis but I need to do more >> proper testing. In the mean time, in order to lower the >> onslaught of users that write to the ML complaining about >> either GCs or OOMEs, I think we should set the handlers back >> to what it was originally (10) for 0.90.0 and add some >> documentation about configuring >> hbase.regionserver.handler.count >> >> I'd like to hear others' thoughts. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue >> online. >> >> > > > >
