[jira] [Updated] (HBASE-27778) Incorrect ReplicationSourceWALReader. totalBufferUsed may cause replication hang up

chenglei (Jira) Tue, 04 Apr 2023 06:57:09 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chenglei updated HBASE-27778:
-----------------------------
    Description: When we read a new WAL Entry in 
{{ReplicationSourceWALReader.readWALEntries}}, we add 
{{ReplicationSourceWALReader. totalBufferUsed}} by the size of new entry in   
{{ReplicationSourceWALReader.addEntryToBatch}}, but the whole {{WALEntryBatch}} 
may not be put to the {{ReplicationSourceWALReader.entryBatchQueue}} because of 
exception(eg. exception thrown by {{WALEntryFilter.filter}} for following WAL 
Entry), but the {{ReplicationSourceWALReader. totalBufferUsed}} is not 
decreased and because the  {{ReplicationSourceWALReader. totalBufferUsed}}  is 
scoped to {{ReplicationSourceManager}}, after a long run, replication to all 
peers may hang up.  (was: When we read a new WAL Entry in 
{{ReplicationSourceWALReader.readWALEntries}}, we add 
{{ReplicationSourceWALReader. totalBufferUsed}} by the size of new entry in   
{{ReplicationSourceWALReader.addEntryToBatch}}, but the whole {{WALEntryBatch}} 
may not be put to the {{ReplicationSourceWALReader.entryBatchQueue}} because of 
exception(eg. exception thrown by {{WALEntryFilter.filter}} for following WAL 
Entry), but the {{ReplicationSourceWALReader. totalBufferUsed}} is not 
decreased and because the  {{ReplicationSourceWALReader. totalBufferUsed}}  is 
scoped to {{ReplicationSourceManager}}, after a long run, all peers may be go 
slow and eventually block completely.)

> Incorrect  ReplicationSourceWALReader. totalBufferUsed may cause replication 
> hang up
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-27778
>                 URL: https://issues.apache.org/jira/browse/HBASE-27778
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.6.0, 3.0.0-alpha-3
>            Reporter: chenglei
>            Priority: Major
>
> When we read a new WAL Entry in 
> {{ReplicationSourceWALReader.readWALEntries}}, we add 
> {{ReplicationSourceWALReader. totalBufferUsed}} by the size of new entry in   
> {{ReplicationSourceWALReader.addEntryToBatch}}, but the whole 
> {{WALEntryBatch}} may not be put to the 
> {{ReplicationSourceWALReader.entryBatchQueue}} because of exception(eg. 
> exception thrown by {{WALEntryFilter.filter}} for following WAL Entry), but 
> the {{ReplicationSourceWALReader. totalBufferUsed}} is not decreased and 
> because the  {{ReplicationSourceWALReader. totalBufferUsed}}  is scoped to 
> {{ReplicationSourceManager}}, after a long run, replication to all peers may 
> hang up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-27778) Incorrect ReplicationSourceWALReader. totalBufferUsed may cause replication hang up

Reply via email to