[ 
https://issues.apache.org/jira/browse/HDFS-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606323#comment-17606323
 ] 

ASF GitHub Bot commented on HDFS-16775:
---------------------------------------

hadoop-yetus commented on PR #4902:
URL: https://github.com/apache/hadoop/pull/4902#issuecomment-1250335997

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 15s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | -1 :x: |  spotbugs  |   3m 25s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4902/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html)
 |  hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  22m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 239m 53s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 10s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 348m 48s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs |
   |  |  org.apache.hadoop.net.Node is incompatible with expected argument type 
DatanodeStorageInfo in 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(int,
 Node, Set, long, int, List, boolean, EnumMap)  At 
BlockPlacementPolicyRackFaultTolerant.java:argument type DatanodeStorageInfo in 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(int,
 Node, Set, long, int, List, boolean, EnumMap)  At 
BlockPlacementPolicyRackFaultTolerant.java:[line 226] |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4902/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4902 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 073d50524576 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9ca1469a3d1bc31c6e52d3b219f7af2aeec9459c |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4902/1/testReport/ |
   | Max. process+thread count | 3104 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4902/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Improve BlockPlacementPolicyRackFaultTolerant's chooseOnce
> ----------------------------------------------------------
>
>                 Key: HDFS-16775
>                 URL: https://issues.apache.org/jira/browse/HDFS-16775
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>              Labels: pull-request-available
>
> In our online cluster ,for the existence of EC blocks, the decommissioning 
> datanode speed is relatively slow,
> there are many info logs 'Not enough replicas was chosen. Reason: 
> {TOO_MANY_NODES_ON_RACK=13,NO_REQUIRED_STORAGE_TYPE=1,NOT_IN_SERVICE=5}' in 
> the NameNode log ,
> as follow:
> {code:java}
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(904)) - [
> Node /rack1/ip1:50010 [
> Datanode ip1:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack1/ip2:50010 [
> Datanode ip2:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack1/ip3:50010 [
> Datanode ip3:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack1/ip5:50010 [
> Datanode ip5:50010 is not chosen since the node is not in service.
> Node /rack1/ip6:50010 [
> Datanode ip6:50010 is not chosen since the rack has too many chosen nodes.
> Datanode None is not chosen since required storage types are unavailable  for 
> storage type DISK.
> 2022-08-17 14:22:53,133 INFO  blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(912)) 
> - Not enough replicas was chosen. Reason: {TOO_MANY_NODES_ON_RACK=4, 
> NO_REQUIRED_STORAGE_TYPE=1, NOT_IN_SERVICE=1}
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseLocalRack(718)) 
> - Failed to choose from local rack (location = /rack1), retry with the rack 
> of the next replica (location = /rack2)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:914)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:800)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:710)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:670)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:220)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:96)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:478)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:350)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:170)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:63)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:2089)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:2027)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:5137)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:5003)
>         at java.lang.Thread.run(Thread.java:748)
>   
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(904)) - [
> Node /rack2/ip6:50010 [
>   Datanode ip6:50010 is not chosen since the node is not in service.
> Node /rack2/ip7:50010 [
>   Datanode ip7:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip8:50010 [
>   Datanode ip8:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip9:50010 [
>   Datanode ip9:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip10:50010 [
>   Datanode ip10:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip11:50010 [
>   Datanode ip11:50010 is not chosen since the rack has too many chosen nodes.
> Node /rack2/ip12:50010 [
>   Datanode ip12:50010 is not chosen since the rack has too many chosen nodes.
> ...
>   Datanode None is not chosen since required storage types are unavailable  
> for storage type DISK.
> 2022-08-17 14:22:53,133 INFO  blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(912)) 
> - Not enough replicas was chosen. Reason: {TOO_MANY_NODES_ON_RACK=16, 
> NO_REQUIRED_STORAGE_TYPE=1,NOT_IN_SERVICE=1}
> 2022-08-17 14:22:53,133 DEBUG blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseFromNextRack(748)) 
> - Failed to choose from the next rack (location = /rack2), retry choosing 
> randomly 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:914)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:800)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:745)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:722)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:670)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:220)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:96)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:478)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:350)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:170)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:63)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:2089)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:2027)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:5137)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:5003)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> this seriously affects the datanode  decommissioning speed
> The process of choose target dn for the current EC : 
> chooseLocalStorage->chooseLocalRack->chooseFromNextRack->chooseRandom
> 1.chooseLocalStorage choose localMachine as the target ,and localMachine is 
> srcNodes[0], it maybe not decommissioning and was in the excluded list, so is 
> not available
> 2.chooseLocalRack choose one node from the rack that localMachine is on, 
> maybe the rack maybe has already exists one node ,so is not available
> 3.chooseFromNextRack choose next node on the srcNodes retry with its rack, 
> maybe the rack has already exists one node, so is not available
> 4.last retry choose randomly
> So, We can optimize this logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to