[ https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263239#comment-17263239 ]
Wellington Chevreuil commented on HBASE-24813: ---------------------------------------------- Merged into master, branch-2, branch-2.4 and branch-2.2. Waiting on [~huaxiangsun] green sign to merge into branch-2.3 once he's done with 2.3.4 release. > ReplicationSource should clear buffer usage on ReplicationSourceManager upon > termination > ---------------------------------------------------------------------------------------- > > Key: HBASE-24813 > URL: https://issues.apache.org/jira/browse/HBASE-24813 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.4, 2.5.0 > Reporter: Wellington Chevreuil > Assignee: Wellington Chevreuil > Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1, 2.3.5 > > Attachments: TestReplicationSyncUpTool.log, > image-2020-10-09-10-50-00-372.png > > > Following investigations on the issue described by [~elserj] on HBASE-24779, > we found out that once a peer is removed, thus killing peers related > *ReplicationSource* instance, it may leave > *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if > *ReplicationSourceWALReader* had put some entries on its queue to be > processed by *ReplicationSourceShipper,* but the peer removal killed the > shipper before it could process the pending entries. When > *ReplicationSourceWALReader* thread add entries to the queue, it increments > *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. > When those entries are read by *ReplicationSourceShipper,* > *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also > decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* > is terminated, otherwise those unprocessed entries size would be consuming > *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets > restarted. This may be a problem for deployments with multiple peers, or if > new peers are added.** -- This message was sent by Atlassian Jira (v8.3.4#803005)