> That's funny. My understanding was, region servers were redundant inherently. 
> If
> they are "semiredundant", there should be a root cause like some wrong 
> settings
> or a bug.
>
> Could someone from HBase experts comment on this?

0.89 is a developer release, it should be treated as such (eg do
expect bugs) and this is the version used by Matthew. A newer release
candidate was posted here:
http://people.apache.org/~jdcryans/hbase-0.89.20100924-candidate-1/
and this is the version we're using in production (and on a few other
clusters) at StumbleUpon. We can kill -9 region servers as much as we
want, and the cluster does recover. Previously there was an issue with
empty log files that's now fixed, also 0.89.2010830 introduced a
changed that's incompatible with the old way of recovering edits from
a failed region server, so if leftovers from a previous split are
present this can prevent the master from splitting logs at all (which
is another issue Matthew got in a separate thread).

J-D

Reply via email to