[
https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883828#action_12883828
]
sam rash commented on HDFS-1262:
--------------------------------
one note:
{code}
public void updateRegInfo(DatanodeID nodeReg) {
name = nodeReg.getName();
infoPort = nodeReg.getInfoPort();
// update any more fields added in future.
}
{code}
should be:
{code}
public void updateRegInfo(DatanodeID nodeReg) {
name = nodeReg.getName();
infoPort = nodeReg.getInfoPort();
ipcPort = nodeReg.getIpcPort();
// update any more fields added in future.
}
{code}
it wasn't copying the ipcPort for some reason.
My patch includes this fix
trunk doesn't have this bug
> Failed pipeline creation during append leaves lease hanging on NN
> -----------------------------------------------------------------
>
> Key: HDFS-1262
> URL: https://issues.apache.org/jira/browse/HDFS-1262
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs client, name-node
> Affects Versions: 0.20-append
> Reporter: Todd Lipcon
> Assignee: sam rash
> Priority: Critical
> Fix For: 0.20-append
>
> Attachments: hdfs-1262-1.txt
>
>
> Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened
> was the following:
> 1) File's original writer died
> 2) Recovery client tried to open file for append - looped for a minute or so
> until soft lease expired, then append call initiated recovery
> 3) Recovery completed successfully
> 4) Recovery client calls append again, which succeeds on the NN
> 5) For some reason, the block recovery that happens at the start of append
> pipeline creation failed on all datanodes 6 times, causing the append() call
> to throw an exception back to HBase master. HBase assumed the file wasn't
> open and put it back on a queue to try later
> 6) Some time later, it tried append again, but the lease was still assigned
> to the same DFS client, so it wasn't able to recover.
> The recovery failure in step 5 is a separate issue, but the problem for this
> JIRA is that the NN can think it failed to open a file for append when the NN
> thinks the writer holds a lease. Since the writer keeps renewing its lease,
> recovery never happens, and no one can open or recover the file until the DFS
> client shuts down.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.