[
https://issues.apache.org/jira/browse/HDFS-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716758#comment-17716758
]
ASF GitHub Bot commented on HDFS-16989:
---------------------------------------
hadoop-yetus commented on PR #5593:
URL: https://github.com/apache/hadoop/pull/5593#issuecomment-1523544877
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 1m 6s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include
any new or modified tests. Please justify why no new tests are needed for this
patch. Also please list what manual steps were performed to verify this patch.
|
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 46m 45s | | trunk passed |
| +1 :green_heart: | compile | 1m 35s | | trunk passed with JDK
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 |
| +1 :green_heart: | compile | 1m 24s | | trunk passed with JDK
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
| +1 :green_heart: | checkstyle | 1m 8s | | trunk passed |
| +1 :green_heart: | mvnsite | 1m 33s | | trunk passed |
| +1 :green_heart: | javadoc | 1m 12s | | trunk passed with JDK
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 |
| +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
| +1 :green_heart: | spotbugs | 3m 48s | | trunk passed |
| +1 :green_heart: | shadedclient | 26m 17s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 1m 20s | | the patch passed |
| +1 :green_heart: | compile | 1m 23s | | the patch passed with JDK
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 |
| +1 :green_heart: | javac | 1m 23s | | the patch passed |
| +1 :green_heart: | compile | 1m 15s | | the patch passed with JDK
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
| +1 :green_heart: | javac | 1m 15s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| -0 :warning: | checkstyle | 0m 51s |
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 28 unchanged -
0 fixed = 31 total (was 28) |
| +1 :green_heart: | mvnsite | 1m 23s | | the patch passed |
| +1 :green_heart: | javadoc | 0m 54s | | the patch passed with JDK
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 |
| +1 :green_heart: | javadoc | 1m 24s | | the patch passed with JDK
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
| -1 :x: | spotbugs | 3m 31s |
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html)
| hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1
total (was 0) |
| +1 :green_heart: | shadedclient | 28m 51s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| -1 :x: | unit | 331m 8s |
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs in the patch passed. |
| +1 :green_heart: | asflicense | 0m 50s | | The patch does not
generate ASF License warnings. |
| | | 457m 9s | | |
| Reason | Tests |
|-------:|:------|
| SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs |
| |
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor$BlockTargetPair
defines equals and uses Object.hashCode() At
DatanodeDescriptor.java:Object.hashCode() At DatanodeDescriptor.java:[lines
88-91] |
| Failed junit tests |
hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage |
| | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
| | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
| | hadoop.hdfs.server.datanode.TestReadOnlySharedStorage |
| | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
| | hadoop.hdfs.TestBlockStoragePolicy |
| | hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling |
| | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
| | hadoop.hdfs.server.blockmanagement.TestBlockManager |
| | hadoop.hdfs.server.namenode.TestHostsFiles |
| | hadoop.hdfs.server.datanode.TestDeleteBlockPool |
| | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
| | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks |
| | hadoop.hdfs.server.blockmanagement.TestNodeCount |
| | hadoop.hdfs.server.datanode.TestDirectoryScanner |
| | hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks |
| | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
| | hadoop.hdfs.server.datanode.TestBlockRecovery2 |
| | hadoop.hdfs.TestFileAppend4 |
| | hadoop.hdfs.server.namenode.TestUpgradeDomainBlockPlacementPolicy |
| | hadoop.hdfs.server.blockmanagement.TestPendingReconstruction |
| | hadoop.hdfs.server.mover.TestStorageMover |
| | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
| | hadoop.hdfs.server.datanode.TestDataNodeTcpNoDelay |
| | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
| | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
| | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
| | hadoop.hdfs.server.namenode.TestProcessCorruptBlocks |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.42 ServerAPI=1.42 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/5593 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux c53911d21bc4 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3
19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 0af5c7247cec55abd7b93ff474b42baeef4331b2 |
| Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/testReport/ |
| Max. process+thread count | 2413 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5593/1/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> Large scale block transfer causes too many excess blocks
> --------------------------------------------------------
>
> Key: HDFS-16989
> URL: https://issues.apache.org/jira/browse/HDFS-16989
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.4.0, 3.3.5
> Reporter: farmmamba
> Priority: Critical
> Labels: pull-request-available
>
> Recently, we change the replication factor of a directory which has 1.6PB
> from 2 to 3. There are 76 million blocks in this directory. After we
> execute the setrep cmd, the active namenode prints lots of logs like below:
> {code:java}
> PendingReconstructionMonitor timed out blk_xxxx_260285131{code}
> and many datanodes prints lots of duplicated logs like below:
> {code:java}
> 2023-04-21 13:58:17,627 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(1.1.1.1:50010,
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075,
> infoSecurePort=0, ipcPort=8010,
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting
> thread to transfer
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> 2023-04-21 14:21:21,296 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DataTransfer, at 1.1.1.1:50010: Transmitted
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131
> (numBytes=524384907) to /2.2.2.2:50010
> 14:34:19,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(1.1.1.1:50010,
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075,
> infoSecurePort=0, ipcPort=8010,
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting
> thread to transfer
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> 2023-04-21 14:37:58,207 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(1.1.1.1:50010,
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075,
> infoSecurePort=0, ipcPort=8010,
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting
> thread to transfer
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> 14:40:46,817 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(1.1.1.1:50010,
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075,
> infoSecurePort=0, ipcPort=8010,
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting
> thread to transfer
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> {code}
> It is funny that an same block blk_1333463885_260285131 was transfer multiple
> times even though it has been transmitted successfully. the excess transfer
> request will trigger lots of ReplicaAlreadyExistsException in target
> datanode, because the replica has already been transmitted and the state of
> the replica is FINALIZED.
>
> The reason is in RedundancyMonitor#processPendingReconstructions and
> BlcokManager#validateReconstructionWork.
> 1、RedundancyMonitor#computeDatanodeWork() generates transfer tasks using
> neededReconstruction and addTaskToDatanode, and put the tasks into
> pendingReconstruction.
> 2、We set *dfs.namenode.replication.work.multiplier.per.iteration = 200,* the
> specific cluster has 400 datanodes. So, RedundancyMonitor may generate 80K
> block transfer tasks per iteration at most. After
> dfs.namenode.reconstruction.pending.timeout-sec (5min), the requests in
> pendingReconstruction will time-out, and PendingReconstructionMonitor thread
> will put the time-out requests into timedOutItems.
> 3、RedundancyMonitor#processPendingReconstructions() will put the requests in
> timedOutItems into neededReconstruction again.
> 4、TimeUnit.MILLISECONDS.sleep(redundancyRecheckIntervalMs);
> 5、next iteration of while loop,RedundancyMonitor#computeDatanodeWork() will
> generates transfer tasks using neededReconstruction again, here will
> generate repeated task or different target node task(due to the chooseTarget
> method).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]