Ashu Pachauri created HBASE-18549:

             Summary: Unclaimed replication queues can go undetected
                 Key: HBASE-18549
             Project: HBase
          Issue Type: Bug
          Components: Replication
            Reporter: Ashu Pachauri
            Priority: Critical
             Fix For: 1.3.2

We have come across this situation multiple times where a zookeeper issues can 
cause NodeFailoverWorker to fail picking up replication queue for a dead region 
server silently. One example is when the znode size for a particular queue 
exceed jute.maxBuffer value.

There can be other situations that may lead to this and just go undetected. We 
need to have a metric for number of unclaimed replication queues. This will 
help in mitigating the problem through alerting on the metric and identifying 
underlying issues.

This message was sent by Atlassian JIRA

Reply via email to