[
https://issues.apache.org/jira/browse/HDFS-16171?focusedWorklogId=637685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637685
]
ASF GitHub Bot logged work on HDFS-16171:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Aug/21 05:03
Start Date: 13/Aug/21 05:03
Worklog Time Spent: 10m
Work Description: virajjasani commented on pull request #3280:
URL: https://github.com/apache/hadoop/pull/3280#issuecomment-898192874
FYI @ferhui @amahussein filed the Jira.
How flaky is resolved?
The no of under-replicated blocks on DN2 can either be 3 or 4 depending on
actual blocks available in Datanode Storage. Hence, in order to make sure that
once both DN1 and DN2 are decommissioned -- we have 4 under replicated blocks
-- we need to first wait for total 8 blocks to be reported (including replicas)
by both DNs together. This is the additional check. Once we make sure of this,
we won't run in flaky test failures where sometimes due to 1 replica not being
reported even before we start decommissioning, we might run into case where we
can't asset all 4 blocks to be under replicated.
Hence, I have added additional validation before we start decommissioning
DN1.
After recent changes, haven't seen test failing in multiple test runs. Could
you please take a look?
Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 637685)
Time Spent: 20m (was: 10m)
> testDecommissionStatus is flaky (for both TestDecommissioningStatus and
> TestDecommissioningStatusWithBackoffMonitor)
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR]
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
> Time elapsed: 3.299 s <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4>
> but was:<3>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
> at
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]