[ 
https://issues.apache.org/jira/browse/HBASE-28620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MisterWang updated HBASE-28620:
-------------------------------
    Description: 
When peer changes, replication closes the reader and shipper created earlier. 
However, after the specified timeout, the shipper still does not automatically 
close (It was interrupted, but it didn't close properly). The existing code 
simply returns without releasing quota. Not cleaning buffer usage.
In one practice of my company, in this case, the quota was full because it was 
not released in time, so wal reader could not continue read new data and 
replication had a backlog.

 

The log is as follows:

2024-05-20 20:00:00,796 WARN 
[RpcServer.default.FPRWQ.Fifo.read.handler=70,queue=1,port=16020] 
regionserver.ReplicationSourceShipper: Shipper clearWALEntryBatch method timed 
out whilst waiting reader/shipper thread to stop. Not cleaning buffer usage. 
Shipper alive: peer1; Reader alive: false
2024-05-20 20:00:01,351 WARN peer=peer1, can't read more edits from WAL as 
buffer usage 268435456B exceeds limit 268435456B

  was:
When peer changes, replication closes the reader and shipper created earlier. 
However, after the specified timeout, the shipper still does not automatically 
close. The existing code simply returns without releasing quota. Not cleaning 
buffer usage.
In one practice of my company, in this case, the quota was full because it was 
not released in time, so wal reader could not continue read new data and 
replication had a backlog.

 

The log is as follows:

2024-05-20 20:00:00,796 WARN 
[RpcServer.default.FPRWQ.Fifo.read.handler=70,queue=1,port=16020] 
regionserver.ReplicationSourceShipper: Shipper clearWALEntryBatch method timed 
out whilst waiting reader/shipper thread to stop. Not cleaning buffer usage. 
Shipper alive: peer1; Reader alive: false
2024-05-20 20:00:01,351 WARN peer=peer1, can't read more edits from WAL as 
buffer usage 268435456B exceeds limit 268435456B


> replication quota leak when peer changes
> ----------------------------------------
>
>                 Key: HBASE-28620
>                 URL: https://issues.apache.org/jira/browse/HBASE-28620
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: MisterWang
>            Priority: Critical
>              Labels: pull-request-available
>
> When peer changes, replication closes the reader and shipper created earlier. 
> However, after the specified timeout, the shipper still does not 
> automatically close (It was interrupted, but it didn't close properly). The 
> existing code simply returns without releasing quota. Not cleaning buffer 
> usage.
> In one practice of my company, in this case, the quota was full because it 
> was not released in time, so wal reader could not continue read new data and 
> replication had a backlog.
>  
> The log is as follows:
> 2024-05-20 20:00:00,796 WARN 
> [RpcServer.default.FPRWQ.Fifo.read.handler=70,queue=1,port=16020] 
> regionserver.ReplicationSourceShipper: Shipper clearWALEntryBatch method 
> timed out whilst waiting reader/shipper thread to stop. Not cleaning buffer 
> usage. Shipper alive: peer1; Reader alive: false
> 2024-05-20 20:00:01,351 WARN peer=peer1, can't read more edits from WAL as 
> buffer usage 268435456B exceeds limit 268435456B



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to