[ 
https://issues.apache.org/jira/browse/HBASE-28620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MisterWang updated HBASE-28620:
-------------------------------
    Description: 
When peer changes, replication closes the reader and shipper created earlier. 
However, after the specified timeout, the shipper still does not automatically 
close. The existing code simply returns without releasing quota. Not cleaning 
buffer usage.
In one practice of my company, in this case, the quota was full because it was 
not released in time, so wal reader could not continue read new data and 
replication had a backlog.

 

The log is as follows:

2024-05-20 20:00:00,796 WARN 
[RpcServer.default.FPRWQ.Fifo.read.handler=70,queue=1,port=16020] 
regionserver.ReplicationSourceShipper: Shipper clearWALEntryBatch method timed 
out whilst waiting reader/shipper thread to stop. Not cleaning buffer usage. 
Shipper alive: peer1; Reader alive: false
2024-05-20 20:00:01,351 WARN peer=peer1, can't read more edits from WAL as 
buffer usage 268435456B exceeds limit 268435456B

  was:
Shipper clearWALEntryBatch method timed out whilst waiting reader/shipper 
thread to stop when peer changes. Not cleaning buffer usage.

When the amount of data written to the table in the peer is relatively large, 
the quota is already full and has not been released, resulting in the wall 
reader being unable to read new data.

 

The log is as follows:


2024-05-20 20:00:00,796 WARN 
[RpcServer.default.FPRWQ.Fifo.read.handler=70,queue=1,port=16020] 
regionserver.ReplicationSourceShipper: Shipper clearWALEntryBatch method timed 
out whilst waiting reader/shipper thread to stop. Not cleaning buffer usage. 
Shipper alive: peer1; Reader alive: false
2024-05-20 20:00:01,351 WARN peer=peer1, can't read more edits from WAL as 
buffer usage 268435456B exceeds limit 268435456B


> replication quota leak when peer changes
> ----------------------------------------
>
>                 Key: HBASE-28620
>                 URL: https://issues.apache.org/jira/browse/HBASE-28620
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: MisterWang
>            Priority: Critical
>
> When peer changes, replication closes the reader and shipper created earlier. 
> However, after the specified timeout, the shipper still does not 
> automatically close. The existing code simply returns without releasing 
> quota. Not cleaning buffer usage.
> In one practice of my company, in this case, the quota was full because it 
> was not released in time, so wal reader could not continue read new data and 
> replication had a backlog.
>  
> The log is as follows:
> 2024-05-20 20:00:00,796 WARN 
> [RpcServer.default.FPRWQ.Fifo.read.handler=70,queue=1,port=16020] 
> regionserver.ReplicationSourceShipper: Shipper clearWALEntryBatch method 
> timed out whilst waiting reader/shipper thread to stop. Not cleaning buffer 
> usage. Shipper alive: peer1; Reader alive: false
> 2024-05-20 20:00:01,351 WARN peer=peer1, can't read more edits from WAL as 
> buffer usage 268435456B exceeds limit 268435456B



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to