[ https://issues.apache.org/jira/browse/HDFS-17620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881473#comment-17881473 ]
ASF GitHub Bot commented on HDFS-17620: --------------------------------------- hadoop-yetus commented on PR #7035: URL: https://github.com/apache/hadoop/pull/7035#issuecomment-2348089297 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 58s | | trunk passed | | +1 :green_heart: | compile | 1m 25s | | trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | javadoc | 1m 12s | | trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | | +1 :green_heart: | javadoc | 1m 48s | | trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | | +1 :green_heart: | spotbugs | 3m 16s | | trunk passed | | +1 :green_heart: | shadedclient | 36m 34s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | -1 :x: | mvninstall | 0m 50s | [/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | -1 :x: | compile | 0m 54s | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt) | hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04. | | -1 :x: | javac | 0m 54s | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt) | hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04. | | -1 :x: | compile | 0m 49s | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt) | hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05. | | -1 :x: | javac | 0m 49s | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt) | hadoop-hdfs in the patch failed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05. | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 58s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 8 new + 13 unchanged - 0 fixed = 21 total (was 13) | | -1 :x: | mvnsite | 0m 53s | [/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | +1 :green_heart: | javadoc | 1m 0s | | the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | | -1 :x: | spotbugs | 0m 51s | [/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | -1 :x: | shadedclient | 15m 5s | | patch has errors when building and testing our client artifacts. | |||| _ Other Tests _ | | -1 :x: | unit | 0m 53s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | +1 :green_heart: | asflicense | 0m 36s | | The patch does not generate ASF License warnings. | | | | 111m 59s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/7035 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 28dc1d7b17c4 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / a6902fe609749d6b5f6c6f756de570591f9b7e12 | | Default Java | Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/testReport/ | | Max. process+thread count | 552 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7035/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Better block placement for small EC files > ----------------------------------------- > > Key: HDFS-17620 > URL: https://issues.apache.org/jira/browse/HDFS-17620 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding, namenode > Affects Versions: 3.3.6 > Reporter: Junegunn Choi > Priority: Major > Labels: pull-request-available > Attachments: image-2024-09-10-13-22-50-247.png, screenshot-1.png > > > h2. Problem description > If an erasure-coded file is not large enough to fill the stripe width of the > EC policy, the block distribution can be suboptimal. > For example, an RS-6-3-1024K EC file smaller than 1024K will have 1 data > block and 3 parity blocks. While all 9 (6 + 3) storage locations are chosen > by the block placement policy, only 4 of them are used, and the last 3 > locations are for parity blocks. If the cluster has a very small number of > racks (e.g. 3), with the current scheme to find a pipeline with the shortest > path, the last nodes are likely to be in the same rack, resulting in a > suboptimal rack distribution. > {noformat} > Locations: N1 N2 N3 N4 N5 N6 N7 N8 N9 > Racks: R1 R1 R1 R2 R2 R2 R3 R3 R3 > Blocks: D1 P1 P2 P3 > {noformat} > We can see that blocks are stored in only 2 racks, not 3. > Because the block does not have enough racks, {{ErasureCodingWork}} will > later be created to replicate the block to a new rack, however, the current > code tries to copy the block to the first node in the chosen locations, > regardless of its rack. So it is not guaranteed to improve the situation, and > we constantly see {{PendingReconstructionMonitor timed out}} messages in the > log. > h2. Proposed solution > 1. Reorder the chosen locations by rack so that the parity blocks are stored > in as many racks as possible. > 2. Make {{ErasureCodingWork}} try to find a target on a new rack > h2. Real-world test result > We first noticed the problem on our HBase cluster running Hadoop 3.3.6 on 18 > nodes across 3 racks. After setting RS-6-3-1024K policy on the HBase data > directory, we noticed that > 1. FSCK reports "Unsatisfactory placement block groups" for small EC files. > {noformat} > /hbase/***: Replica placement policy is violated for ***. Block should be > additionally replicated on 2 more rack(s). Total number of racks in the > cluster: 3 > ... > Erasure Coded Block Groups: > ... > Unsatisfactory placement block groups: 1475 (2.5252092 %) > {noformat} > 2. Namenode keeps logging "PendingReconstructionMonitor timed out" messages > every recheck-interval (5 minutes). > 3. and {{FSNamesystem.UnderReplicatedBlocks}} metric bumps and clears every > recheck-interval. > After applying the patch, all the problems are gone. "Unsatisfactory > placement block groups" is now zero. No metrics bumps or "timed out" logs. > !screenshot-1.png|width=500! > !image-2024-09-10-13-22-50-247.png|width=500! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org