HBase region server failure issues

Claudiu Soroiu Mon, 14 Apr 2014 13:48:29 -0700

Hi all,

My name is Claudiu Soroiu and I am new to hbase/hadoop but not new to
distributed computing in FT/HA environments and I see there are a lot of
issues reported related to the region server failure.


The main problem I see it is related to recovery time in case of a node
failure and distributed log splitting. After some tunning I managed to
reduce it to 8 seconds in total and for the moment it fits the needs.

I have one question: *Why there is only one WAL file per region server and
not one WAL per region itself? *
I haven't found the exact answer anywhere, that's why i'm asking on this
list and please point me to the right direction if i missed the list.

My point is that eliminating the need of splitting a log in case of failure
reduces the downtime for the regions and the only delay that we will see
will be related to transferring data over network to the region servers
that will take over the failed regions.
This is feasible only if having multiple WAL's per Region Server does not
affect the overall write performance.

Thanks,
Claudiu

HBase region server failure issues

Reply via email to