[
https://issues.apache.org/jira/browse/HBASE-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenglei updated HBASE-27778:
-----------------------------
Description: When we read a new WAL Entry in
{{ReplicationSourceWALReader.readWALEntries}}, we add
{{ReplicationSourceWALReader. totalBufferUsed}} by the size of new entry in
{{ReplicationSourceWALReader.addEntryToBatch}}, but the whole {{WALEntryBatch}}
may not be put to the {{ReplicationSourceWALReader.entryBatchQueue}} because of
exception(eg. exception thrown by {{WALEntryFilter.filter}} for following WAL
Entry), but the {{ReplicationSourceWALReader. totalBufferUsed}} is not
decreased and because the {{ReplicationSourceWALReader. totalBufferUsed}} is
scoped to {{ReplicationSourceManager}}, after a long run, replication to all
peers may hang up. (was: When we read a new WAL Entry in
{{ReplicationSourceWALReader.readWALEntries}}, we add
{{ReplicationSourceWALReader. totalBufferUsed}} by the size of new entry in
{{ReplicationSourceWALReader.addEntryToBatch}}, but the whole {{WALEntryBatch}}
may not be put to the {{ReplicationSourceWALReader.entryBatchQueue}} because of
exception(eg. exception thrown by {{WALEntryFilter.filter}} for following WAL
Entry), but the {{ReplicationSourceWALReader. totalBufferUsed}} is not
decreased and because the {{ReplicationSourceWALReader. totalBufferUsed}} is
scoped to {{ReplicationSourceManager}}, after a long run, all peers may be go
slow and eventually block completely.)
> Incorrect ReplicationSourceWALReader. totalBufferUsed may cause replication
> hang up
> ------------------------------------------------------------------------------------
>
> Key: HBASE-27778
> URL: https://issues.apache.org/jira/browse/HBASE-27778
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 2.6.0, 3.0.0-alpha-3
> Reporter: chenglei
> Priority: Major
>
> When we read a new WAL Entry in
> {{ReplicationSourceWALReader.readWALEntries}}, we add
> {{ReplicationSourceWALReader. totalBufferUsed}} by the size of new entry in
> {{ReplicationSourceWALReader.addEntryToBatch}}, but the whole
> {{WALEntryBatch}} may not be put to the
> {{ReplicationSourceWALReader.entryBatchQueue}} because of exception(eg.
> exception thrown by {{WALEntryFilter.filter}} for following WAL Entry), but
> the {{ReplicationSourceWALReader. totalBufferUsed}} is not decreased and
> because the {{ReplicationSourceWALReader. totalBufferUsed}} is scoped to
> {{ReplicationSourceManager}}, after a long run, replication to all peers may
> hang up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)