[ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953640#comment-14953640
 ] 

Rushabh S Shah commented on HDFS-7916:
--------------------------------------

[~yzhangal]
{quote}We deployed this fix to one of our cluster and unfortunately the 
datanode were still spamming the namenode with the same stack trace as before.
We debugged the issue and found out that the Datanode were receiving 
StandbyException wrapped in RemoteException.
And the patch was checking for StandbyException and not RemoteException.
{quote}
Inititally we were catching specifically StandbyException. At that time we 
thought not to catch StandbyException in ErrorReportAction.
But then we discovered that the namenode was throwing StandbyException wrapped 
in RemoteException.
So we chose to ignore all the RemoteException in both the class and just log it 
as WARN.

Hope this helps.

> 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
> infinite loop
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-7916
>                 URL: https://issues.apache.org/jira/browse/HDFS-7916
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.0
>            Reporter: Vinayakumar B
>            Assignee: Rushabh S Shah
>            Priority: Critical
>             Fix For: 2.7.1
>
>         Attachments: HDFS-7916-01.patch, HDFS-7916-1.patch
>
>
> if any badblock found, then BPSA for StandbyNode will go for infinite times 
> to report it.
> {noformat}2015-03-11 19:43:41,528 WARN 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
> BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
> stobdtserver3/10.224.54.70:18010
> org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
> to report bad block 
> BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
>         at 
> org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to