[
https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lukas Majercak updated HDFS-11576:
----------------------------------
Comment: was deleted
(was: In the logs, the rejection of commitBlockSynchronization on the NN is:
{code}
INFO ipc.Server (Server.java:run(2200)) - IPC Server handler 1 on 26435, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
from 127.0.0.1:26520 Call#554 Retry#0
java.io.IOException: The recovery id 1059 does not match current recovery id
1069 for block BP-325268981-10.123.116.86-1490315990951:blk_1073741825_1001
{code})
> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---------------------------------------------------------------------------
>
> Key: HDFS-11576
> URL: https://issues.apache.org/jira/browse/HDFS-11576
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, hdfs, namenode
> Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
> Reporter: Lukas Majercak
> Assignee: Lukas Majercak
> Priority: Critical
> Attachments: HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is
> always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to
> NN, which fails because X < X+1
> ...
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]