Hey Andrew,

They were still all dead? From session expiration or OOME? Or HDFS issues?

J-D

On Thu, Dec 2, 2010 at 3:17 PM, Andrew Purtell <[email protected]> wrote:
> J-D,
>
> Your hypothesis is interesting.
>
> I took the same step -- change 100 -> 10 -- to reduce the probability that 
> regionservers would OOME under high write load as generated by an end 
> simulation I have been developing, to model an application we plan to deploy. 
> (Stack, this is the next generation of the monster that led us to find the 
> problem with ByteArrayOutputStream buffer management in the 0.19 time frame. 
> It's baaaaack, bigger than before.)
>
> Reducing handler.count did move the needle, but sooner or later they are all 
> dead, at 4G heap, or 8G heap... and the usual GC tuning tricks are not 
> helping.
>
> When I get back from this latest tour of Asia next week I need to dig in with 
> jhat and jprofiler.
>
> Best regards,
>
>    - Andy
>
>
> --- On Thu, 12/2/10, Jean-Daniel Cryans (JIRA) <[email protected]> wrote:
>
>> From: Jean-Daniel Cryans (JIRA) <[email protected]>
>> Subject: [jira] Created: (HBASE-3303) Lower hbase.regionserver.handler.count 
>> from 25 back to 10
>> To: [email protected]
>> Date: Thursday, December 2, 2010, 2:02 PM
>> Lower
>> hbase.regionserver.handler.count from 25 back to 10
>> ---------------------------------------------------------
>>
>>
>>    Key: HBASE-3303
>>
>>    URL: https://issues.apache.org/jira/browse/HBASE-3303
>>
>>    Project: HBase
>>           Issue Type: Improvement
>>             Reporter:
>> Jean-Daniel Cryans
>>             Assignee:
>> Jean-Daniel Cryans
>>              Fix
>> For: 0.90.0
>>
>>
>> With HBASE-2506 in mind, I tested a low-memory environment
>> (2GB of heap) with a lot of concurrent writers using the
>> default write buffer to verify if a lower number of handlers
>> actually helps reducing the occurrence full GCs. Very
>> unscientifically, at this moment I think it's safe to say
>> that yes, it helps.
>>
>> With the defaults, I saw a region server struggling more
>> and more because the random inserters at some point started
>> filling up all the handlers and were all BLOCKED trying to
>> sync the WAL. It's safe to say that each of those clients
>> carried a payload that the GC cannot get rid of and it's one
>> that we don't account for (as opposed to MemStore and the
>> block cache).
>>
>> With a much lower setting of 5, I didn't see the
>> situation.
>>
>> It kind of confirms my hypothesis but I need to do more
>> proper testing. In the mean time, in order to lower the
>> onslaught of users that write to the ML complaining about
>> either GCs or OOMEs, I think we should set the handlers back
>> to what it was originally (10) for 0.90.0 and add some
>> documentation about configuring
>> hbase.regionserver.handler.count
>>
>> I'd like to hear others' thoughts.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue
>> online.
>>
>>
>
>
>
>

Reply via email to