Andrew Purtell updated HBASE-18549:
    Fix Version/s:     (was: 1.4.2)

> Unclaimed replication queues can go undetected
> ----------------------------------------------
>                 Key: HBASE-18549
>                 URL: https://issues.apache.org/jira/browse/HBASE-18549
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Ashu Pachauri
>            Priority: Critical
>             Fix For: 1.3.2, 1.5.0, 1.4.3
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.

This message was sent by Atlassian JIRA

Reply via email to