[
https://issues.apache.org/jira/browse/HDFS-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700274#comment-17700274
]
ASF GitHub Bot commented on HDFS-16942:
---------------------------------------
ayushtkn commented on PR #5460:
URL: https://github.com/apache/hadoop/pull/5460#issuecomment-1468343688
> But will the checkstyle not affect all future PRs? I think I have seen
that happen before, where a new checkstyle issue comes in, and then future PRs
are impacted by it, but I may be wrong.
How? You folks mean the new PR will also show this checstyle warning? So,
our checkstyle works by computing diff b/w the trunk and the patch and flags
when the number changes.
if you check the warning above:
```
hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 79 unchanged -
0 fixed = 81 total (was 79)
```
There were already 79, the PR had 2 more, so that is what it flags
> Send error to datanode if FBR is rejected due to bad lease
> ----------------------------------------------------------
>
> Key: HDFS-16942
> URL: https://issues.apache.org/jira/browse/HDFS-16942
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.6
>
>
> When a datanode sends a FBR to the namenode, it requires a lease to send it.
> On a couple of busy clusters, we have seen an issue where the DN is somehow
> delayed in sending the FBR after requesting the least. Then the NN rejects
> the FBR and logs a message to that effect, but from the Datanodes point of
> view, it thinks the report was successful and does not try to send another
> report until the 6 hour default interval has passed.
> If this happens to a few DNs, there can be missing and under replicated
> blocks, further adding to the cluster load. Even worse, I have see the DNs
> join the cluster with zero blocks, so it is not obvious the under replication
> is caused by lost a FBR, as all DNs appear to be up and running.
> I believe we should propagate an error back to the DN if the FBR is rejected,
> that way, the DN can request a new lease and try again.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]