[ 
https://issues.apache.org/jira/browse/HDFS-16574?focusedWorklogId=771780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771780
 ]

ASF GitHub Bot logged work on HDFS-16574:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/May/22 10:09
            Start Date: 18/May/22 10:09
    Worklog Time Spent: 10m 
      Work Description: hadoop-yetus commented on PR #4322:
URL: https://github.com/apache/hadoop/pull/4322#issuecomment-1129823794

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 41s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 46s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 57s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  26m 18s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  3s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4322/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 281 unchanged 
- 0 fixed = 284 total (was 281)  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  | 347m  7s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4322/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  0s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 464m 39s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.hdfs.TestReplaceDatanodeFailureReplication |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4322/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4322 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux bddb66d5dad2 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 375c2f3f576eabfc4581788fb19cb12ee2ac7b98 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4322/2/testReport/ |
   | Max. process+thread count | 1894 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4322/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 771780)
    Time Spent: 0.5h  (was: 20m)

> Reduces the time it takes once to hold FSNamesystem write lock to remove 
> blocks associated with dead datanodes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16574
>                 URL: https://issues.apache.org/jira/browse/HDFS-16574
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: ZhiWei Shi
>            Assignee: ZhiWei Shi
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{{}BlockManager.{}}}removeBlocksAssociatedTo(final DatanodeStorageInfo 
> storageInfo)
> {code:java}
> /** Remove the blocks associated to the given datanode. */
>   void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
>     providedStorageMap.removeDatanode(node);
>     final Iterator<BlockInfo> it = node.getBlockIterator();
>     while(it.hasNext()) {
>       removeStoredBlock(it.next(), node);
>     }
>     // Remove all pending DN messages referencing this DN.
>     pendingDNMessages.removeAllMessagesForDatanode(node);
>     node.resetBlocks();
>     invalidateBlocks.remove(node);
>   }
> {code}
> it holds FSNamesystem write lock to remove all blocks associated with a dead 
> data node,which can take a lot of time when the datanode has a large number 
> of blocks. eg :49.2s for 22.30M blocks,7.9s for 2.84M blocks.
> {code:java}
> # ${ip-1} has 22304612 blocks
> 2022-05-08 13:43:35,864 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /default-rack/${ip-1}:800
> 2022-05-08 13:43:35,864 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 49184 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579)
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409)
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
> java.lang.Thread.run(Thread.java:745)
>         Number of suppressed write-lock reports: 0
>         Longest write-lock held interval: 49184 
> # ${ip-2} has 22304612 blocks
> 2022-05-08 08:11:55,559 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /default-rack/${ip-2}:800
> 2022-05-08 08:11:55,560 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 7925 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579)
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409)
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
> java.lang.Thread.run(Thread.java:745)
>     Number of suppressed write-lock reports: 0
>     Longest write-lock held interval: 7925{code}
> This whill block all RPC requests that require FSNamesystem lock,and cannot 
> process HealthMonitor request from zkfc in time,which eventually leads to 
> namenode failover
> {code:java}
> 2022-05-08 13:43:32,279 WARN org.apache.hadoop.ha.HealthMonitor: 
> Transport-level exception trying to monitor health of NameNode at 
> hd044.corp.yodao.com/10.108.162.60:8000: java.net.SocketTimeoutException: 
> 45000 millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861 
> remote=hd044.corp.yodao.com/10.108.162.60:8000] Call From 
> hd044.corp.yodao.com/10.108.162.60 to hd044.corp.yodao.com:8000 failed on 
> socket timeout exception: java.net.SocketTimeoutException: 45000 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861 
> remote=hd044.corp.yodao.com/10.108.162.60:8000]; For more details see:  
> http://wiki.apache.org/hadoop/SocketTimeout
> 2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.HealthMonitor: Entering 
> state SERVICE_NOT_RESPONDING
> 2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.ZKFailoverController: Local 
> service NameNode at hd044.corp.yodao.com/10.108.162.60:8000 entered state: 
> SERVICE_NOT_RESPONDING
> 2022-05-08 13:43:32,922 INFO 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController: -- Local NN thread dump 
> --
> Process Thread Dump: 
> ......
> Thread 100 (IPC Server handler 0 on 8000):
>   State: WAITING
>   Blocked count: 14895292
>   Waited count: 210385351
>   Waiting on 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@5723ab57
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>     
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>     
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readLock(FSNamesystemLock.java:144)
>     
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1556)
>     
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1842)
>     
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:707)
>     
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
>     
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>     org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>     org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>     org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>     org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
> ......
> Thread 41 
> (org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@7c351808):
>   State: RUNNABLE
>   Blocked count: 10
>   Waited count: 720146
>   Stack:
>     
> org.apache.hadoop.hdfs.util.LightWeightHashSet.resize(LightWeightHashSet.java:479)
>     
> org.apache.hadoop.hdfs.util.LightWeightHashSet.expandIfNecessary(LightWeightHashSet.java:497)
>     
> org.apache.hadoop.hdfs.util.LightWeightHashSet.add(LightWeightHashSet.java:244)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.UnderReplicatedBlocks.update(UnderReplicatedBlocks.java:312)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReplications(BlockManager.java:3718)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeStoredBlock(BlockManager.java:3310)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlocksAssociatedTo(BlockManager.java:1314)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDatanode(DatanodeManager.java:638)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:683)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:407)
>     
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
>     java.lang.Thread.run(Thread.java:745)
> .....
>  -- Local NN thread dump --
> 2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ZKFailoverController: 
> Quitting master election for NameNode at 
> hd044.corp.yodao.com/10.108.162.60:8000 and marking that fencing is necessary
> 2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Yielding from election
> 2022-05-08 13:43:32,925 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x97e5d7aac530e03 closed
> 2022-05-08 13:43:32,925 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Ignoring stale result from old client with sessionId 0x97e5d7aac530e03
> 2022-05-08 13:43:32,926 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> 2022-05-08 13:43:36,186 INFO org.apache.hadoop.ha.HealthMonitor: Entering 
> state SERVICE_HEALTHY{code}
> After it has processed some blocks or consumed some time, it should release 
> the FSNamesystem lock for a period of time so that other Rpc requests can be 
> processed in time



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to