Mike Drob commented on HBASE-20835:

I think part of my confusion is the state of replication internals, maybe. The 
layout in ZK seems like it is private and subject to change at any version 
boundary because the constants and config properties are defined in a bunch of 
IA.Private classes like ZKReplicationPeerStorage. Is there a stable, public 
facing method for getting the same metrics that I'm looking for?

Currently we are looking directly in ZK because that is the only place to get 
this information, and it happens to be very brittle. I would be very happy to 
move to a more well defined API.

> Document how to get replication reporting
> -----------------------------------------
>                 Key: HBASE-20835
>                 URL: https://issues.apache.org/jira/browse/HBASE-20835
>             Project: HBase
>          Issue Type: Task
>          Components: Replication
>    Affects Versions: 2.1.0
>            Reporter: Mike Drob
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 3.0.0
> Based on my questions at the tail end of HBASE-19543
> bq. We have some tooling that checks on replication queues and reads the 
> znode as the source of truth. When replication is disabled, it's expected 
> that the node was still there, but just empty. Is there a better way to get 
> this same information?
> I understand that with table based replication it doesn't make sense to check 
> ZK for status. However, losing the ability to inspect the data and get 
> information is a tough hit for operators. Do we have APIs that expose the 
> same sort of metrics?
> bq. how many peers/queues, queue size, position in the queue, and age of last 
> op
> Assigning to you for now, Duo, since you were both primary implementor and RM 
> for 2.1.0 and I'm not sure who else would know the answers. If the docs 
> already exist, then nothing to do but we should include them in the RN. Maybe 
> this will need additional code, but I hope it's already there and is 
> something we can write a workaround for.

This message was sent by Atlassian JIRA

Reply via email to