Failed pipeline creation during append leaves lease hanging on NN
-----------------------------------------------------------------

                 Key: HDFS-1262
                 URL: https://issues.apache.org/jira/browse/HDFS-1262
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs client, name-node
    Affects Versions: 0.20-append
            Reporter: Todd Lipcon
            Priority: Critical
             Fix For: 0.20-append


Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened 
was the following:
1) File's original writer died
2) Recovery client tried to open file for append - looped for a minute or so 
until soft lease expired, then append call initiated recovery
3) Recovery completed successfully
4) Recovery client calls append again, which succeeds on the NN
5) For some reason, the block recovery that happens at the start of append 
pipeline creation failed on all datanodes 6 times, causing the append() call to 
throw an exception back to HBase master. HBase assumed the file wasn't open and 
put it back on a queue to try later
6) Some time later, it tried append again, but the lease was still assigned to 
the same DFS client, so it wasn't able to recover.

The recovery failure in step 5 is a separate issue, but the problem for this 
JIRA is that the NN can think it failed to open a file for append when the NN 
thinks the writer holds a lease. Since the writer keeps renewing its lease, 
recovery never happens, and no one can open or recover the file until the DFS 
client shuts down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to