[
https://issues.apache.org/jira/browse/HBASE-22620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870829#comment-16870829
]
leizhang edited comment on HBASE-22620 at 6/24/19 6:12 AM:
-----------------------------------------------------------
Thank you for reply ,no data to replicate, I mean that no entry in wal need to
be replicated from the log queue, because I see the logic cleaning hfile refs
is in the shipEdits() method in ReplicationSourceShipperThread.class, parts of
the code in shipEdits() are as follows:
{code:java}
WALEntryBatch entryBatch = entryReader.take();
// send the entryBath to the target cluster
shipEdits(entryBatch);
{code}
then in the shipEdits() method,the method call chains are :
{code:java}
shipEdits() ->updateLogPosition()
->ReplicationSourceManager.logPositionAndCleanOldLogs(){code}
I see that only when entryReader has entryBatch to replicate, then the
logPositionAndCleanOldLogs() method will be called and the oldLogs refs will be
removed from zk normally( under znode
/hbase/replication/rs/\{resionserver}/\{peerId}/). but when no entry to be
replicated,(for example, there are no table regions that open the replication
property on regionserver A but the cluster opend the replication),the
logPositionAndCleanOldLogs() will never be triggered on A ,then the zk refs
will remain in the zk forerver,the real log file on hdfs will not be
cleanerd,either. After a long time, with the log roll mechanism,lots of log
files will accumulate, and can't be removed normally due to the ref on zk.
consider two situations:
1、no data in a wal file
2、there are entries in a wal file,but won't be replicated later(the table
doesn't open the replitation property,so the entries will be skip)
just as you say, the entire wal file will be also read, and the current
replating log file position can be updated normally, but the oldLog fille refs
clean up logic will never be triggered, because there are no entry need to
replicated. the real phenomenon on my test cluster also valid that.
was (Author: zl_cn_hbase):
Thank you for reply ,no data to replicate, I mean that no entry in wal need to
be replicated from the log queue, because I see the logic cleaning hfile refs
is in the shipEdits() method in ReplicationSourceShipperThread.class, parts of
the code in shipEdits() are as follows:
{code:java}
WALEntryBatch entryBatch = entryReader.take();
// send the entryBath to the target cluster
shipEdits(entryBatch);
{code}
then in the shipEdits() method,the method call chains are :
{code:java}
shipEdits() ->updateLogPosition()
->ReplicationSourceManager.logPositionAndCleanOldLogs(){code}
I see that only when entryReader has entryBatch to replicate, then the
logPositionAndCleanOldLogs() method will be called and the oldLogs refs will be
removed from zk normally( under znode
/hbase/replication/rs/\{resionserver}/\{peerId}/). but when no entry to be
replicated,(for example, there are no table regions that open the replication
property on regionserver A ),the logPositionAndCleanOldLogs() will never be
triggered on A ,then the zk refs will remain in the zk forerver,the real log
file on hdfs will not be cleanerd,either. After a long time, with the log
roll mechanism,lots of log files will accumulate, and can't be removed normally
due to the ref on zk.
consider two situations:
1、no data in a wal file
2、there are entries in a wal file,but won't be replicated later(the table
doesn't open the replitation property,so the entries will be skip)
just as you say, the entire wal file will be also read, and the current
replating log file position can be updated normally, but the oldLog fille refs
clean up logic will never be triggered, because there are no entry need to
replicated. the real phenomenon on my test cluster also valid that.
> When a cluster open replication,regionserver will not clean up the walLog
> references on zk due to no wal entry need to be replicated
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-22620
> URL: https://issues.apache.org/jira/browse/HBASE-22620
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Affects Versions: 2.0.3, 1.4.9
> Reporter: leizhang
> Priority: Major
>
> When I open the replication feature on my hbase cluster (20 regionserver
> nodes), for example, I create a table with 3 regions, which opened on 3
> regionservers of 20. Due to no data to replicate ,the left 17 nodes
> accumulate lots of wal references on the zk node
> "/hbase/replication/rs/\{resionserver}/\{peerId}/" and will not be cleaned
> up, which cause lots of wal file on hdfs will not be cleaned up either. When
> I check my test cluster after about four months, it accumulates about 5w wal
> files in the oldWal directory on hdfs. The source code shows that only there
> are data to be replicated, and after some data is replicated in the source
> endpoint, then it will executed the useless wal file check, and clean their
> references on zk, and the hdfs useless wal files will be cleaned up normally.
> So I think do we need other method to trigger the useless wal cleaning job in
> a replication cluster? May be in the replication progress report schedule
> task (just like ReplicationStatisticsTask.class)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)