[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887490#comment-16887490
 ] 

Ayush Saxena commented on HDFS-14655:
-------------------------------------

Thanx [~vagarychen] we too tried with just the approach of cancelling the 
outstanding calls before we move hand. 
 Firstly when the {{dfs.ha.tail-edits.period}} period is 0. With this approach, 
the problem doesn't get solved. As when we call, we don't actually kill the 
thread, but just call interrupt, and when the connection thread comes in the 
next iteration and sees it being interrupted, then it stops. So this whole 
process also has some latency. With 0 as configured value, The pace at which 
the thread is removed by cancel is outsmarted by the tailing frequency.
 Secondly, As you said, for normal scenario also, this shall be happen, when 
majority comes fast and others were just suppose to come(We can't expect 
practically for all responses together), so we will land up killing them too. 
And this is little noise too as the LOGs shall also be filled with 
InterruptedException.

 

> SBN : Namenode crashes if one of The JN is down
> -----------------------------------------------
>
>                 Key: HDFS-14655
>                 URL: https://issues.apache.org/jira/browse/HDFS-14655
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Harshakiran Reddy
>            Priority: Major
>         Attachments: HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXXXXXX/XXXXXXX | Retrying connect to server: XXXXXXX/XXXXXXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>       at java.lang.Thread.start0(Native Method)
>       at java.lang.Thread.start(Thread.java:717)
>       at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>       at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>       at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>       at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>       at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>       at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>       at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>       at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>       at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:360)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>       at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to