[
https://issues.apache.org/jira/browse/HDFS-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609235#comment-17609235
]
ASF GitHub Bot commented on HDFS-16775:
---------------------------------------
hadoop-yetus commented on PR #4902:
URL: https://github.com/apache/hadoop/pull/4902#issuecomment-1257253920
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 38s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include
any new or modified tests. Please justify why no new tests are needed for this
patch. Also please list what manual steps were performed to verify this patch.
|
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 38m 42s | | trunk passed |
| +1 :green_heart: | compile | 1m 38s | | trunk passed with JDK
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | compile | 1m 26s | | trunk passed with JDK
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | checkstyle | 1m 16s | | trunk passed |
| +1 :green_heart: | mvnsite | 1m 35s | | trunk passed |
| +1 :green_heart: | javadoc | 1m 16s | | trunk passed with JDK
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javadoc | 1m 45s | | trunk passed with JDK
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | spotbugs | 3m 30s | | trunk passed |
| +1 :green_heart: | shadedclient | 22m 50s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 1m 21s | | the patch passed |
| +1 :green_heart: | compile | 1m 23s | | the patch passed with JDK
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javac | 1m 23s | | the patch passed |
| +1 :green_heart: | compile | 1m 21s | | the patch passed with JDK
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | javac | 1m 21s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| +1 :green_heart: | checkstyle | 0m 57s | | the patch passed |
| +1 :green_heart: | mvnsite | 1m 20s | | the patch passed |
| +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javadoc | 1m 32s | | the patch passed with JDK
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | spotbugs | 3m 25s | | the patch passed |
| +1 :green_heart: | shadedclient | 24m 12s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 239m 54s | | hadoop-hdfs in the patch
passed. |
| +1 :green_heart: | asflicense | 1m 1s | | The patch does not
generate ASF License warnings. |
| | | 349m 59s | | |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4902/2/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/4902 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux 089adc9d0748 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 67e2449bd902302468028b52d1fc63905c1efc06 |
| Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4902/2/testReport/ |
| Max. process+thread count | 2936 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4902/2/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> Improve BlockPlacementPolicyRackFaultTolerant's chooseOnce
> ----------------------------------------------------------
>
> Key: HDFS-16775
> URL: https://issues.apache.org/jira/browse/HDFS-16775
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haiyang Hu
> Assignee: Haiyang Hu
> Priority: Major
> Labels: pull-request-available
>
> In our online cluster ,for the existence of EC blocks, the decommissioning
> datanode speed is relatively slow,
> there are many info logs 'Not enough replicas was chosen. Reason:
> {TOO_MANY_NODES_ON_RACK=13,NO_REQUIRED_STORAGE_TYPE=1,NOT_IN_SERVICE=5}' in
> the NameNode log ,
> as follow:
> {code:java}
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy
> (BlockPlacementPolicyDefault.java:chooseRandom(904)) - [
> Node /rack1/ip1:50010 [
> Datanode ip1:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack1/ip2:50010 [
> Datanode ip2:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack1/ip3:50010 [
> Datanode ip3:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack1/ip5:50010 [
> Datanode ip5:50010 is not chosen since the node is not in service.
> Node /rack1/ip6:50010 [
> Datanode ip6:50010 is not chosen since the rack has too many chosen nodes.
> Datanode None is not chosen since required storage types are unavailable for
> storage type DISK.
> 2022-08-17 14:22:53,133 INFO blockmanagement.BlockPlacementPolicy
> (BlockPlacementPolicyDefault.java:chooseRandom(912))
> - Not enough replicas was chosen. Reason: {TOO_MANY_NODES_ON_RACK=4,
> NO_REQUIRED_STORAGE_TYPE=1, NOT_IN_SERVICE=1}
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy
> (BlockPlacementPolicyDefault.java:chooseLocalRack(718))
> - Failed to choose from local rack (location = /rack1), retry with the rack
> of the next replica (location = /rack2)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:914)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:800)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:710)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:670)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:220)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:96)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:478)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:350)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:170)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:63)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:2089)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:2027)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:5137)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:5003)
> at java.lang.Thread.run(Thread.java:748)
>
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy
> (BlockPlacementPolicyDefault.java:chooseRandom(904)) - [
> Node /rack2/ip6:50010 [
> Datanode ip6:50010 is not chosen since the node is not in service.
> Node /rack2/ip7:50010 [
> Datanode ip7:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip8:50010 [
> Datanode ip8:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip9:50010 [
> Datanode ip9:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip10:50010 [
> Datanode ip10:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip11:50010 [
> Datanode ip11:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip12:50010 [
> Datanode ip12:50010 is not chosen since the rack has too many chosen nodes.
> ...
> Datanode None is not chosen since required storage types are unavailable
> for storage type DISK.
> 2022-08-17 14:22:53,133 INFO blockmanagement.BlockPlacementPolicy
> (BlockPlacementPolicyDefault.java:chooseRandom(912))
> - Not enough replicas was chosen. Reason: {TOO_MANY_NODES_ON_RACK=16,
> NO_REQUIRED_STORAGE_TYPE=1,NOT_IN_SERVICE=1}
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy
> (BlockPlacementPolicyDefault.java:chooseFromNextRack(748))
> - Failed to choose from the next rack (location = /rack2), retry choosing
> randomly
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:914)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:800)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:745)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:722)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:670)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:220)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:96)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:478)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:350)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:170)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:63)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:2089)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:2027)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:5137)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:5003)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> this seriously affects the datanode decommissioning speed
> The process of choose target dn for the current EC :
> chooseLocalStorage->chooseLocalRack->chooseFromNextRack->chooseRandom
> 1.chooseLocalStorage choose localMachine as the target ,and localMachine is
> srcNodes[0], it maybe not decommissioning and was in the excluded list, so is
> not available
> 2.chooseLocalRack choose one node from the rack that localMachine is on,
> maybe the rack maybe has already exists one node ,so is not available
> 3.chooseFromNextRack choose next node on the srcNodes retry with its rack,
> maybe the rack has already exists one node, so is not available
> 4.last retry choose randomly
> So, We can optimize this logic.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]