[
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965185#comment-15965185
]
Inigo Goiri commented on HDFS-11576:
------------------------------------
I would log the events the other way around:
* Info: when a block has timed out and we issue a new request.
* Debug: when a block is still within the time out time.
BTW, with sl4j you could use proper format in the logs.
Other than that, it looks good.
Anybody available to review this patch?
> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---------------------------------------------------------------------------
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, hdfs, namenode
> Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
> Reporter: Lukas Majercak
> Assignee: Lukas Majercak
> Priority: Critical
> Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch,
> HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to
> NN, which fails because X < X+1
> ...
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]