[
https://issues.apache.org/jira/browse/HBASE-23008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007129#comment-17007129
]
Michael Stack commented on HBASE-23008:
---------------------------------------
[~filtertip] There are comments over on the PR...
> ReplicationSourceShipper has no chance to delete hlog znode when the wal
> entry batch always empty
> -------------------------------------------------------------------------------------------------
>
> Key: HBASE-23008
> URL: https://issues.apache.org/jira/browse/HBASE-23008
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 2.0.0
> Reporter: Zheng Wang
> Assignee: Zheng Wang
> Priority: Major
>
> My live cluster config master-master replication,and only one is used to put
> data,as active cluster.
> Recently ,i find there are a great many znode in
> /hbase/replication/rs/#server/#peer in backup cluster,at least 10000+.
>
> I think the reason is , the wal entry in backup cluster are filtered by
> ClusterMarkingEntryFilter totaly, so ReplicationSourceWALReader will not put
> any data to entryBatchQueue,and ReplicationSourceShipper always blocked at
> entryReader.take(),it has no chance to delete hlog znode.
> The thread stack of walReader and walShiper is below:
> {code:java}
> "main-EventThread.replicationSource,2.replicationSource.hostname%2C16020%2C1567586932902.hostname%2C16020%2C1567586932902.regiongroup-0,2.replicationSource.wal-reader.hostname%2C16020%2C1567586932902.hostname%2C16020%2C1567586932902.regiongroup-0,2"
> #157238 daemon prio=5 os_prio=0 tid=0x00007f7634be8800 nid=0x377ef waiting
> on condition
> [0x00007f6114c0e000]"main-EventThread.replicationSource,2.replicationSource.hostname%2C16020%2C1567586932902.hostname%2C16020%2C1567586932902.regiongroup-0,2.replicationSource.wal-reader.hostname%2C16020%2C1567586932902.hostname%2C16020%2C1567586932902.regiongroup-0,2"
> #157238 daemon prio=5 os_prio=0 tid=0x00007f7634be8800 nid=0x377ef waiting
> on condition [0x00007f6114c0e000] java.lang.Thread.State: TIMED_WAITING
> (sleeping) at java.lang.Thread.sleep(Native Method) at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.handleEmptyWALEntryBatch(ReplicationSourceWALReader.java:192)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:142)
> "main-EventThread.replicationSource,2.replicationSource.hostname%2C16020%2C1567586932902.hostname%2C16020%2C1567586932902.regiongroup-0,2"
> #157237 daemon prio=5 os_prio=0 tid=0x00007f76350b0000 nid=0x377ee waiting
> on condition [0x00007f6108173000] java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method) - parking to wait for
> <0x00007f6f99bb6718> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.take(ReplicationSourceWALReader.java:248)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:108)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)