[
https://issues.apache.org/jira/browse/HBASE-26195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398621#comment-17398621
]
Rushabh Shah commented on HBASE-26195:
--------------------------------------
> since this is a rare condition and I don't expect it to affect availability
> much.
I agree that approach 3 is the correct one. Just adding one more data point.
During the time of this slow hdfs sync issue, there were 21 region servers
which did memstore rollback. There were 295 RS in that cluster. If we go with
the "Abort RS" route, then we would have lost around 7% of the capacity. In
smaller clusters, it would be more impactful since those bad DN's would be part
of many WAL pipelines.
[~gjacoby] [~apurtell]
> Data is present in replicated cluster but not present in primary cluster.
> -------------------------------------------------------------------------
>
> Key: HBASE-26195
> URL: https://issues.apache.org/jira/browse/HBASE-26195
> Project: HBase
> Issue Type: Bug
> Components: Replication, wal
> Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0
> Reporter: Rushabh Shah
> Assignee: Rushabh Shah
> Priority: Major
>
> We encountered a case where we are seeing some rows (via Phoenix) in
> replicated cluster but they are not present in source/active cluster.
> Triaging further we found memstore rollback logs in few of the region servers.
> {noformat}
> 2021-07-28 14:17:59,353 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,353 DEBUG [,queue=25,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,354 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,354 DEBUG [,queue=25,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,354 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,354 DEBUG [,queue=25,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,355 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,355 DEBUG [,queue=25,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> 2021-07-28 14:17:59,356 DEBUG [,queue=25,port=60020] regionserver.HRegion -
> rollbackMemstore rolled back 23
> {noformat}
> Looking more into logs, found that there were some hdfs layer issues sync'ing
> wal to hdfs.
> It was taking around 6 mins to sync wal. Logs below
> {noformat}
> 2021-07-28 14:19:30,511 WARN [sync.0] hdfs.DataStreamer - Slow
> waitForAckedSeqno took 391210ms (threshold=30000ms). File being written:
> /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
> block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
> pipeline datanodes:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
> 2021-07-28 14:19:30,589 WARN [sync.1] hdfs.DataStreamer - Slow
> waitForAckedSeqno took 391148ms (threshold=30000ms). File being written:
> /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
> block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
> pipeline datanodes:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
> 2021-07-28 14:19:30,589 WARN [sync.2] hdfs.DataStreamer - Slow
> waitForAckedSeqno took 391147ms (threshold=30000ms). File being written:
> /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
> block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
> pipeline datanodes:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
> 2021-07-28 14:19:30,591 INFO [sync.0] wal.FSHLog - Slow sync cost: 391289
> ms, current pipeline:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
> 2021-07-28 14:19:30,591 INFO [sync.1] wal.FSHLog - Slow sync cost: 391227
> ms, current pipeline:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
> 2021-07-28 14:19:30,591 WARN [sync.1] wal.FSHLog - Requesting log roll
> because we exceeded slow sync threshold; time=391227 ms, threshold=10000 ms,
> current pipeline:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
> 2021-07-28 14:19:30,591 INFO [sync.2] wal.FSHLog - Slow sync cost: 391227
> ms, current pipeline:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
> 2021-07-28 14:19:30,591 WARN [sync.2] wal.FSHLog - Requesting log roll
> because we exceeded slow sync threshold; time=391227 ms, threshold=10000 ms,
> current pipeline:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
> 2021-07-28 14:19:30,591 WARN [sync.3] hdfs.DataStreamer - Slow
> waitForAckedSeqno took 391217ms (threshold=30000ms). File being written:
> /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
> block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
> pipeline datanodes:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
> 2021-07-28 14:19:30,591 INFO [sync.3] wal.FSHLog - Slow sync cost: 391217
> ms, current pipeline:
> [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
>
> DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
>
> DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
> {noformat}
> Just to add more details, we also found there was a namenode failover. Might
> be a contributing factor on why sync's were slow but not sure.
> {noformat}
> 2021-07-28 14:19:30,390 WARN [<rs-host>:60020-3]
> retry.RetryInvocationHandler - A failover has occurred since the start of
> call #8491727 Client
> NamenodeProtocolTranslatorPB.mkdirs over
> <namenode-hostname>/<namenode-ip>:8020
> 2021-07-28 14:19:30,390 WARN [<rs-host>:60020-7]
> retry.RetryInvocationHandler - A failover has occurred since the start of
> call #8491726 ClientNamenodeProtocolTranslatorPB.mkdirs over
> <namenode-hostname>/<namenode-ip>:8020
> 2021-07-28 14:19:30,391 WARN [20%2C1626191371499,1]
> retry.RetryInvocationHandler - A failover has occurred since the start of
> call #8491723 ClientNamenodeProtocolTranslatorPB.getBlockLocations over
> <namenode-hostname>/<namenode-ip>:8020
> 2021-07-28 14:19:30,391 WARN [0.40:60020.logRoller]
> retry.RetryInvocationHandler - A failover has occurred since the start of
> call #8491729 ClientNamenodeProtocolTranslatorPB.getFileInfo over
> <namenode-hostname>/<namenode-ip>:8020
> 2021-07-28 14:19:30,391 WARN [MemStoreFlusher.1]
> retry.RetryInvocationHandler - A failover has occurred since the start of
> call #8491730 ClientNamenodeProtocolTranslatorPB.getFileInfo over
> <namenode-hostname>/<namenode-ip>:8020
> 2021-07-28 14:19:30,391 WARN [371499.1627480615620]
> retry.RetryInvocationHandler - A failover has occurred since the start of
> call #8491731 ClientNamenodeProtocolTranslatorPB.addBlock over
> <namenode-hostname>/<namenode-ip>:8020
> 2021-07-28 14:19:30,391 WARN [c@prod] retry.RetryInvocationHandler - A
> failover has occurred since the start of call #8491725
> ClientNamenodeProtocolTranslatorPB.renewLease over
> <namenode-hostname>/<namenode-ip>:8020
> 2021-07-28 14:19:30,391 WARN [91371499@<Replicatonsource-thread>]
> retry.RetryInvocationHandler - A failover has occurred since the start of
> call #8491724 ClientNamenodeProtocolTranslatorPB.getBlockLocations over
> <namenode-hostname>/<namenode-ip>:8020
> {noformat}
> We use the default timeout for hbase.regionserver.wal.sync.timeout which is 5
> mins.
> Since the sync took more than 6 minutes (which is more than
> hbase.regionserver.wal.sync.timeout), the client received an Exception and we
> rolled back tje mutations in memstore on the primary cluster.
> But the hdfs sync succeeded +eventually+, the row was present in wal and was
> eligible for replication.
> [~bharathv] [[email protected]] [~gjacoby] helped brainstorming for a
> possible solution. Will update the possible approaches in the comments
> section.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)