[ https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727215#comment-17727215 ]
ASF GitHub Bot commented on HDFS-17030: --------------------------------------- xinglin opened a new pull request, #5700: URL: https://github.com/apache/hadoop/pull/5700 ### Description of PR Added support to fail fast when detecting unreachable/irresponsible standby NN in ObserverReaderProxy ### How was this patch tested? * Unit tests ``` ~/p/h/t/hadoop-hdfs-project (HDFS-17030)> mvn test -Dtest="TestObserverReadProxyProvider.java" [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running org.apache.hadoop.hdfs.server.namenode.ha.TestObserverReadProxyProvider [INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.136 s - in org.apache.hadoop.hdfs.server.namenode.ha.TestObserverReadProxyProvider [INFO] [INFO] Results: [INFO] [INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0 ``` * Tested in a testing cluster + We take a heap dump at a standby NN. ``` bash-4.2$ jmap -F -dump:format=b,file=heapdump-25801.hprof 25801 Attaching to process ID 25801, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.172-b11 Dumping heap to heapdump-25801.hprof ... ``` + Existing hadoop-binary took more than 2 mins to complete the List operation, because we set _ipc.client.rpc-timeout.ms_ to 2 mins. ``` [xinglin@ltx1-hcl14866 ~]$ time hdfs dfs -ls /tmp/testFile.txt 23/05/24 23:07:05 INFO fs.FileBasedMountTableLoader: TID: 1 - Loading mount table from hdfs://ltx1-yugiohnn01.grid.linkedin.com:9000/mounttable/linkfs/ltx1-yugioh-router-mountpoints.json. -rw-r--r > Limit wait time for getHAServiceState in ObserverReaderProxy > ------------------------------------------------------------ > > Key: HDFS-17030 > URL: https://issues.apache.org/jira/browse/HDFS-17030 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Affects Versions: 3.4.0 > Reporter: Xing Lin > Assignee: Xing Lin > Priority: Minor > > When namenode HA is enabled and a standby NN is not responsible, we have > observed it would take a long time to serve a request, even though we have a > healthy observer or active NN. > Basically, when a standby is down, the RPC client would (re)try to connect > that standby for _ipc.client.connect.timeout_ _* > ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a > heap dump at a standby, the NN still accepts the socket connection but it > won't send responses to these RPC requests and we would timeout after > _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters > at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a > request would need to take more than 2 mins to complete when we take a heap > dump at a standby. This has been causing user job failures. > We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending > getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we > still use the original value from the config). However, that would double the > socket connection between clients and the NN. > The proposal is to add a timeout on getHAServiceState() calls in > ObserverReaderProxy and we will only wait for the timeout for an NN to > respond its HA state. Once we pass that timeout, we will move on to the next > NN. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org