[
https://issues.apache.org/jira/browse/HDFS-16456?focusedWorklogId=751563&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-751563
]
ASF GitHub Bot logged work on HDFS-16456:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Apr/22 13:45
Start Date: 01/Apr/22 13:45
Worklog Time Spent: 10m
Work Description: hadoop-yetus commented on pull request #4126:
URL: https://github.com/apache/hadoop/pull/4126#issuecomment-1085920723
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 1m 0s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 2 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +0 :ok: | mvndep | 12m 34s | | Maven dependency ordering for branch |
| +1 :green_heart: | mvninstall | 23m 15s | | trunk passed |
| +1 :green_heart: | compile | 22m 54s | | trunk passed with JDK
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 |
| +1 :green_heart: | compile | 20m 10s | | trunk passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | checkstyle | 3m 38s | | trunk passed |
| +1 :green_heart: | mvnsite | 3m 25s | | trunk passed |
| +1 :green_heart: | javadoc | 2m 30s | | trunk passed with JDK
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 |
| +1 :green_heart: | javadoc | 3m 31s | | trunk passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | spotbugs | 5m 58s | | trunk passed |
| +1 :green_heart: | shadedclient | 23m 33s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch |
| +1 :green_heart: | mvninstall | 2m 17s | | the patch passed |
| +1 :green_heart: | compile | 22m 15s | | the patch passed with JDK
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 |
| +1 :green_heart: | javac | 22m 15s | | the patch passed |
| +1 :green_heart: | compile | 20m 2s | | the patch passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | javac | 20m 2s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| +1 :green_heart: | checkstyle | 3m 36s | | the patch passed |
| +1 :green_heart: | mvnsite | 3m 19s | | the patch passed |
| +1 :green_heart: | javadoc | 2m 27s | | the patch passed with JDK
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 |
| +1 :green_heart: | javadoc | 3m 36s | | the patch passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | spotbugs | 6m 18s | | the patch passed |
| +1 :green_heart: | shadedclient | 24m 11s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 18m 49s | | hadoop-common in the patch
passed. |
| -1 :x: | unit | 420m 34s |
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4126/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs in the patch passed. |
| +1 :green_heart: | asflicense | 1m 11s | | The patch does not
generate ASF License warnings. |
| | | 650m 45s | | |
| Reason | Tests |
|-------:|:------|
| Failed junit tests |
hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
| | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4126/2/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/4126 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell |
| uname | Linux 9a71fa74327b 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 372cf01e09848e8f3372b3eca599f7f109d59c2f |
| Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4126/2/testReport/ |
| Max. process+thread count | 2533 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common
hadoop-hdfs-project/hadoop-hdfs U: . |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4126/2/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
This message was automatically generated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 751563)
Time Spent: 40m (was: 0.5h)
> EC: Decommission a rack with only on dn will fail when the rack number is
> equal with replication
> ------------------------------------------------------------------------------------------------
>
> Key: HDFS-16456
> URL: https://issues.apache.org/jira/browse/HDFS-16456
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec, namenode
> Affects Versions: 3.4.0
> Reporter: caozhiqiang
> Assignee: caozhiqiang
> Priority: Critical
> Labels: pull-request-available
> Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch,
> HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch,
> HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch,
> HDFS-16456.009.patch, HDFS-16456.010.patch
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason:
> # Enable EC policy, such as RS-6-3-1024k.
> # The rack number in this cluster is equal with or less than the replication
> number(9)
> # A rack only has one DN, and decommission this DN.
> The root cause is in
> BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will
> give a limit parameter maxNodesPerRack for choose targets. In this scenario,
> the maxNodesPerRack is 1, which means each rack can only be chosen one
> datanode.
> {code:java}
> protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) {
> ...
> // If more replicas than racks, evenly spread the replicas.
> // This calculation rounds up.
> int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1;
> return new int[] {numOfReplicas, maxNodesPerRack};
> } {code}
> int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1;
> here will be called, where totalNumOfReplicas=9 and numOfRacks=9
> When we decommission one dn which is only one node in its rack, the
> chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder()
> will throw NotEnoughReplicasException, but the exception will not be caught
> and fail to fallback to chooseEvenlyFromRemainingRacks() function.
> When decommission, after choose targets, verifyBlockPlacement() function will
> return the total rack number contains the invalid rack, and
> BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false
> and it will also cause decommission fail.
> {code:java}
> public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs,
> int numberOfReplicas) {
> if (locs == null)
> locs = DatanodeDescriptor.EMPTY_ARRAY;
> if (!clusterMap.hasClusterEverBeenMultiRack()) {
> // only one rack
> return new BlockPlacementStatusDefault(1, 1, 1);
> }
> // Count locations on different racks.
> Set<String> racks = new HashSet<>();
> for (DatanodeInfo dn : locs) {
> racks.add(dn.getNetworkLocation());
> }
> return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas,
> clusterMap.getNumOfRacks());
> } {code}
> {code:java}
> public boolean isPlacementPolicySatisfied() {
> return requiredRacks <= currentRacks || currentRacks >= totalRacks;
> }{code}
> According to the above description, we should make the below modify to fix it:
> # In startDecommission() or stopDecommission(), we should also change the
> numOfRacks in class NetworkTopology. Or choose targets may fail for the
> maxNodesPerRack is too small. And even choose targets success,
> isPlacementPolicySatisfied will also return false cause decommission fail.
> # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first
> chooseOnce() function should also be put in try..catch..., or it will not
> fallback to call chooseEvenlyFromRemainingRacks() when throw exception.
> # In verifyBlockPlacement, we need to remove invalid racks from total
> numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail
> to reconstruct data.
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]