[jira] [Comment Edited] (HBASE-22620) When a cluster open replication,regionserver will not clean up the walLog references on zk due to no wal entry need to be replicated

leizhang (JIRA) Sun, 23 Jun 2019 23:13:43 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870829#comment-16870829
 ]


leizhang edited comment on HBASE-22620 at 6/24/19 6:12 AM:
-----------------------------------------------------------

Thank you for reply ,no data to replicate, I mean that no entry in wal need to  
be replicated from the log queue, because I see the  logic cleaning hfile refs 
is in the shipEdits() method in ReplicationSourceShipperThread.class,  parts of 
the code in shipEdits() are as follows:
{code:java}
WALEntryBatch entryBatch = entryReader.take();
// send the entryBath to the target cluster
shipEdits(entryBatch);
{code}
then in the shipEdits() method,the method call chains are :
{code:java}
shipEdits() ->updateLogPosition() 
->ReplicationSourceManager.logPositionAndCleanOldLogs(){code}
I see that  only when entryReader has entryBatch to replicate, then the 
logPositionAndCleanOldLogs() method will be called and the oldLogs refs will be 
removed from zk normally( under znode 
/hbase/replication/rs/\{resionserver}/\{peerId}/). but when no entry to be 
replicated,（for example, there are no table regions that open the replication 
property on regionserver A but the cluster opend the replication),the 
logPositionAndCleanOldLogs() will never be  triggered on A ,then the zk refs 
will remain in the zk forerver,the real log file on hdfs will not be 
cleanerd,either.  After a long time, with the log  roll mechanism，lots of log 
files will accumulate, and can't be removed normally due to the ref on zk.

consider two situations:

1、no data in a wal file 

2、there are entries in a wal file，but won't be replicated later(the table 
doesn't open the replitation property,so the entries will be skip)

just as you say, the entire wal file will be also read, and the current 
replating log file position can be updated normally, but the oldLog fille refs 
clean up logic will never be triggered, because  there are no entry need to  
replicated.  the real phenomenon on my test cluster also valid that.


was (Author: zl_cn_hbase):
Thank you for reply ,no data to replicate, I mean that no entry in wal need to  
be replicated from the log queue, because I see the  logic cleaning hfile refs 
is in the shipEdits() method in ReplicationSourceShipperThread.class,  parts of 
the code in shipEdits() are as follows:
{code:java}
WALEntryBatch entryBatch = entryReader.take();
// send the entryBath to the target cluster
shipEdits(entryBatch);
{code}
then in the shipEdits() method,the method call chains are :
{code:java}
shipEdits() ->updateLogPosition() 
->ReplicationSourceManager.logPositionAndCleanOldLogs(){code}
I see that  only when entryReader has entryBatch to replicate, then the 
logPositionAndCleanOldLogs() method will be called and the oldLogs refs will be 
removed from zk normally( under znode 
/hbase/replication/rs/\{resionserver}/\{peerId}/). but when no entry to be 
replicated,（for example, there are no table regions that open the replication 
property on regionserver A ),the logPositionAndCleanOldLogs() will never be  
triggered on A ,then the zk refs will remain in the zk forerver,the real log 
file on hdfs will not be cleanerd,either.  After a long time, with the log  
roll mechanism，lots of log files will accumulate, and can't be removed normally 
due to the ref on zk.

consider two situations:

1、no data in a wal file 

2、there are entries in a wal file，but won't be replicated later(the table 
doesn't open the replitation property,so the entries will be skip)

just as you say, the entire wal file will be also read, and the current 
replating log file position can be updated normally, but the oldLog fille refs 
clean up logic will never be triggered, because  there are no entry need to  
replicated.  the real phenomenon on my test cluster also valid that.

> When a cluster open replication,regionserver will not clean up the walLog 
> references on zk due to no wal entry need to be replicated
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-22620
>                 URL: https://issues.apache.org/jira/browse/HBASE-22620
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 2.0.3, 1.4.9
>            Reporter: leizhang
>            Priority: Major
>
> When I open the replication feature on my hbase cluster (20 regionserver 
> nodes), for example, I create a table with 3 regions, which opened on 3 
> regionservers of 20. Due to no data to replicate ,the left 17 nodes  
> accumulate lots of wal references on the zk node 
> "/hbase/replication/rs/\{resionserver}/\{peerId}/"  and will not be cleaned 
> up, which cause lots of wal file on hdfs will not be cleaned up either. When 
> I check my test cluster after about four months, it accumulates about 5w wal 
> files in the oldWal directory on hdfs. The source code shows that only there 
> are data to be replicated, and after some data is replicated in the source 
> endpoint, then it will executed the useless wal file check, and clean their 
> references on zk, and the hdfs useless wal files will be cleaned up normally. 
> So I think do we need other method to trigger the useless wal cleaning job in 
> a replication cluster? May be  in the  replication progress report  schedule 
> task  (just like ReplicationStatisticsTask.class)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-22620) When a cluster open replication,regionserver will not clean up the walLog references on zk due to no wal entry need to be replicated

Reply via email to