wchevreuil commented on a change in pull request #1894:
URL: https://github.com/apache/hbase/pull/1894#discussion_r440335530



##########
File path: src/main/asciidoc/_chapters/ops_mgt.adoc
##########
@@ -2629,6 +2629,91 @@ You can use the HBase Shell command `status 
'replication'` to monitor the replic
 * `status 'replication', 'source'` -- prints the status for each replication 
source, sorted by hostname.
 * `status 'replication', 'sink'` -- prints the status for each replication 
sink, sorted by hostname.
 
+==== Understanding the output
+
+The command output will vary according to the state of replication. For 
example right after a restart
+and if destination peer is not reachable, no replication source threads would 
be running,
+so no metrics would get displayed:
+
+----
+hbase01.home:
+SOURCE: PeerID=1
+Normal Queue: 1
+No Reader/Shipper threads runnning yet.
+SINK: TimeStampStarted=1591985197350, Waiting for OPs...
+----
+
+Under normal circumstances, a healthy, active-active replication deployment 
would
+show the following:
+
+----
+    hbase01.home:
+      SOURCE: PeerID=1
+         Normal Queue: 1
+           AgeOfLastShippedOp=0, TimeStampOfLastShippedOp=Fri Jun 12 18:49:23 
BST 2020, SizeOfLogQueue=1, EditsReadFromLogQueue=1, OpsShippedToTarget=1, 
TimeStampOfNextToReplicate=Fri Jun 12 18:49:23 BST 2020, Replication Lag=0
+      SINK: TimeStampStarted=1591983663458, AgeOfLastAppliedOp=0, 
TimeStampsOfLastAppliedOp=Fri Jun 12 18:57:18 BST 2020

Review comment:
       > Just a question: For active-active replication, in order to get peerId 
of both clusters (defined in each other), we need to run status 'replication' 
at both clusters side right?
   
   Yes. The command only shows the context of an individual cluster, listing 
overall stats about the given cluster source queues and sink threads.
   
   >Getting ageOfLastShipped etc metric values from remote cluster is also not 
that easy even if we want to display here.
   
   This "ageOfLastShipped" metric is related to the source cluster. On the 
source, we have _ReplicationSourceShipper_ thread reading entries from the WAL 
and making synchronous RPC calls to _ReplicationSink_ in the target. If the 
call is success, we get that time in _ReplicationSourceShipper_, decrease from 
the edit entry time and record it as "ageOfLastShipped". So "ageOfLastShipped" 
is how long a given edit took since it entered on source cluster until source 
cluster assumed it was successful replicated. 
   
   @virajjasani would you think this metric description should be improved? 
Looks like current text was not that clear.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to