2010/4/26 Michał Podsiadłowski <podsiadlow...@gmail.com>

> Hi hbase users,
>
> during our tests on production environment we found few really big
> problems that stops us from using hbase. First major problem is
> availability: we have now 6 regions servers + 2 masters + 3 zk. When
> we shutdown normally one region servers it takes about 3-4 minutes or
> longer depends on previous load till master will reassign missing
> regions to alive rs. On regions servers there is usually less then 100
> regions. In master logs we can see some log splitting and then long
> brake and start of reassignning that also can take long time
> especially when cluster is under load. This is way to long we can wait
> because during that time requests to website are not processed.
> Additional very unfortunate situation happened when my friend shutdown
> 3 out of 6 nodes - master started to do the job but something went
> terribly wrong and it started to throw NPE's like mad.
> Here is beginning of disaster : http://pastebin.com/1uh1x1fL after we
> killed this server second one pick up and manage to start but with
> only 91 out of 306 regions and after some long time.
> Another big problem is that table connections in some circumstances
> hangs no error thrown. Web servers request processing threadpool
> quickly runs out of threads and no request are processed and watchdog
> kills the server.
>
>
> for those who want more lecture : http://pastebin.com/UaEPT6nc master
> log from beginning of test
> and second master log http://pastebin.com/shpcDWBn
>
>
> Any help appreciated.
> Thanks, Michal
>

I noticed our region failovers are around 10 - 30 seconds but we did not
have very high load at the time.

As for the client. We noticed this too. If something fails in the hbase
stack zookeeper, region, etc. The connections never seemed to timeout. We
would end up with many webserver threads waiting and hanging on hbase that
were never going to recover. I think there are many cases where clients
never timeout. Sorry for a vague unsubstantiated statement like that (with
no stack trace).

Reply via email to