[ https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263420#comment-17263420 ]
Hudson commented on HBASE-24813: -------------------------------- Results for branch branch-2.2 [build #152 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/152/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (x) {color:red}-1 general checks{color} -- Something went wrong running this stage, please [check relevant console output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/152//console]. (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/152//console]. (x) {color:red}-1 jdk8 hadoop3 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/152//console]. (x) {color:red}-1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} -- Something went wrong with this stage, [check relevant console output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/152//console]. > ReplicationSource should clear buffer usage on ReplicationSourceManager upon > termination > ---------------------------------------------------------------------------------------- > > Key: HBASE-24813 > URL: https://issues.apache.org/jira/browse/HBASE-24813 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.4, 2.5.0 > Reporter: Wellington Chevreuil > Assignee: Wellington Chevreuil > Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1, 2.3.5 > > Attachments: TestReplicationSyncUpTool.log, > image-2020-10-09-10-50-00-372.png > > > Following investigations on the issue described by [~elserj] on HBASE-24779, > we found out that once a peer is removed, thus killing peers related > *ReplicationSource* instance, it may leave > *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if > *ReplicationSourceWALReader* had put some entries on its queue to be > processed by *ReplicationSourceShipper,* but the peer removal killed the > shipper before it could process the pending entries. When > *ReplicationSourceWALReader* thread add entries to the queue, it increments > *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. > When those entries are read by *ReplicationSourceShipper,* > *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also > decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* > is terminated, otherwise those unprocessed entries size would be consuming > *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets > restarted. This may be a problem for deployments with multiple peers, or if > new peers are added.** -- This message was sent by Atlassian Jira (v8.3.4#803005)