Parts of this may end up on the hbase list, but I thought I'd start here. My 
basic problem is:

My cluster is getting full enough that having one data node go down does put a 
bit of pressure on the system (when balanced, every DN is more than half full).

I write (and delete) pretty actively to Hbase & some hdfs direct.

The cluster keeps drifting dangerously out of balance.

I run the balancer daily, but:

   - I've seen reports that you shouldn't rebalance with regionservers running, 
yet, I don't really have a choice. Without HBase, my system is pretty much 
down. If it gets out of balance, it will also come down.

  Anybody here have any idea how badly running the balancer on a heavily active 
system messes things up? (for hdfs/hbase - if anyone knows).

   - Possibly somewhat related: I'm seeing more "failed to move block" errors 
in my balancer logs. It got to the point were I wasn't seeing any effective 
rebalancing occur. I've turned off access to the cluster and rebalanced (one 
node was down to 10% free space, a couple others when up to 50 or more). I'm 
back down to around 20-40% free space on each node (as reported by the hdfs web 
interface).

    How effective is the balancer on a active cluster? Is there any way to make 
it's life easier, so it can stay in balance with daily runs?

I'm not sure why the one node ends up being so heavily favored, either. The 
favoritism even seems to survive taking the node down, and bringing it back up. 
If I can't find the resources to upgrade, I might try that again, but I'm less 
than hopeful about it.

Any ideas? Or do I just need better hardware? Not sure if that's an option, 
though..

Take care,
  -stu


      

Reply via email to