[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970371#comment-16970371
 ] 

Erik Krogen commented on HDFS-14969:
------------------------------------

+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started moving towards this, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs šŸ˜“There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.

> Fix HDFS client unnecessary failover log printing
> -------------------------------------------------
>
>                 Key: HDFS-14969
>                 URL: https://issues.apache.org/jira/browse/HDFS-14969
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.1.3
>            Reporter: Xudong Cao
>            Assignee: Xudong Cao
>            Priority: Minor
>
> In multi-NameNodes scenario,Ā suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rdĀ 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to