Re: regarding to HBase 1316 ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Zhenyu Zhong Wed, 28 Oct 2009 07:46:00 -0700

Nitay,

I am very appreciated.

As Ryan suggested, I increased the zookeeper session timeout to 40seconds
along with the GC options -XX:ParallelGCThreads=8  -XX:+UseConcMarkSweepGC
in place. I set the Heapsize to 4GB.  I also set the vm.swappiness=0.

However it still ran into problem. Please find the following errors.

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server x.x.x.x:60021 for region
YYYY,117.99.7.153,1256396118155, row '1170491458', but failed after 10
attempts.
Exceptions:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1

        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:413)

The input file is about 10GB around 200million rows of data.
This load doesn't seem too large. However this kind of errors keep popping
up.

Does Regionserver need to be deployed to dedicated machines?
Does Zookeeper need to be deployed to dedicated machines as well?

Best,
zhenyu

On Wed, Oct 28, 2009 at 1:37 AM, nitay <[email protected]> wrote:

> Hi Zhenyu,
>
> Sorry for the delay. I started working on this a while back, before I left
> my job for another company. Since then I haven't had much time to work on
> HBase unfortunately :(. I'll try to dig up what I had and see what shape
> it's in and update you.
>
> Cheers,
> -n
>
>
> On Oct 27, 2009, at 3:38 PM, Ryan Rawson wrote:
>
>  Sorry I must have mistyped, I meant to say "40 seconds".  You can
>> still see multi-second pauses at times, so you need to give yourself a
>> bigger buffer.
>>
>> The parallel threads argument should not be necessary, but you do need
>> the UseConcMarkSweepGC flag as well.
>>
>> Let us know how it goes!
>> -ryan
>>
>>
>> On Tue, Oct 27, 2009 at 3:19 PM, Zhenyu Zhong <[email protected]>
>> wrote:
>>
>>> Ryan,
>>> I am very appreciated for your feedbacks.
>>> I have set the zookeeper.session.timeout to seconds which is way higher
>>> than
>>> 40ms.
>>> In the same time, the -Xms is set to 4GB, which should be sufficient.
>>> I also tried GC options like
>>>
>>>  -XX:ParallelGCThreads=8
>>> -XX:+UseConcMarkSweepGC
>>>
>>> I even set the vm.swappiness=0
>>>
>>> However, I still came across the problem that a RegionServer shutdown
>>> itself.
>>>
>>> Best,
>>> zhong
>>>
>>>
>>> On Tue, Oct 27, 2009 at 6:05 PM, Ryan Rawson <[email protected]> wrote:
>>>
>>>  Set the ZK timeout to something like 40ms, and give the GC enough Xmx
>>>> so you never risk entering the much dreaded concurrent-mode-failure
>>>> whereby the entire heap must be GCed.
>>>>
>>>> Consider testing Java 7 and the G1 GC.
>>>>
>>>> We could get a JNI thread to do this, but no one has done so yet. I am
>>>> personally hoping for G1 and in the meantime overprovision our Xmx to
>>>> avoid the concurrent mode failures.
>>>>
>>>> -ryan
>>>>
>>>> On Tue, Oct 27, 2009 at 2:59 PM, Zhenyu Zhong <[email protected]>
>>>> wrote:
>>>>
>>>>> Ryan,
>>>>>
>>>>> Thank you very much.
>>>>> May I ask whether there are any ways to get around this problem to make
>>>>> HBase more stable?
>>>>>
>>>>> best,
>>>>> zhong
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 27, 2009 at 4:06 PM, Ryan Rawson <[email protected]>
>>>>> wrote:
>>>>>
>>>>>  There isnt any working code yet. Just an idea, and a prototype.
>>>>>>
>>>>>> There is some sense that if we can get the G1 GC that we could get rid
>>>>>> of all long pauses, and avoid the need for this.
>>>>>>
>>>>>> -ryan
>>>>>>
>>>>>> On Mon, Oct 26, 2009 at 2:30 PM, Zhenyu Zhong <
>>>>>> [email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am very interesting to the solution that Joey proposed and would
>>>>>>>
>>>>>> like
>>>>
>>>>> to
>>>>>>
>>>>>>> have a try.
>>>>>>> Does anyone have any ideas on how to deploy this zk_wrapper in JNI
>>>>>>> integration?
>>>>>>>
>>>>>>> I would be very appreciated.
>>>>>>>
>>>>>>> thanks
>>>>>>> zhong
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: regarding to HBase 1316 ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Reply via email to