[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540189#comment-14540189
 ] 

Kihwal Lee commented on HDFS-8344:
----------------------------------

The initial IBR (receiving) usually has the size of the first packet, since 
that's what's sent during the creation of the block output stream. The final 
size is reported in the "received" IBR, which is sent after the block replica 
is finalized. The official size is set when the client commits the block with 
the gen stamp and the size.

Let's look at the block creation cycle.
{panel}
a. The NN serves addBlock() request for the client. The new block is added to 
the block collection (INode).
b. The client calls createBlockOutputStream() and write the first packet of the 
block.
c. The datanodes receive the first packet and send receiving IBR to the NN.
d. The client keeps writing.
e. The client send the last packet. The datanodes finalize the block and send 
received IBR to the NN.
f. Once the client receives ACK for the last packet, commits the block on the 
namenode (either by addBlock() or complete()).
{panel}

A cleint can fail at any time, but we are interested in the cases where the 
namenode has block locations but the client does not close the file. If the 
client fails right after {{a}}, no datanodes actually has the block. The size 
should be 0. If the client fails after {{b}} but before {{e}}, the size will be 
that of the first packet. 

I agree that such file should not be left dangling for a long time. Since we do 
not want unconditionally discarding the last block, let's first think about 
detecting and reporting such files. If block recovery fails n times, it can be 
reported in the metrics as well as the UI. Admins can bring back the node, 
truncate the file, or delete it.  If one kind of certain action is always 
preferred, that can be automated externally.  What do you think?

> NameNode doesn't recover lease for files with missing blocks
> ------------------------------------------------------------
>
>                 Key: HDFS-8344
>                 URL: https://issues.apache.org/jira/browse/HDFS-8344
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch
>
>
> I found another\(?) instance in which the lease is not recovered. This is 
> reproducible easily on a pseudo-distributed single node cluster
> # Before you start it helps if you set. This is not necessary, but simply 
> reduces how long you have to wait
> {code}
>       public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
>       public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
> LEASE_SOFTLIMIT_PERIOD;
> {code}
> # Client starts to write a file. (could be less than 1 block, but it hflushed 
> so some of the data has landed on the datanodes) (I'm copying the client code 
> I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
> # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
> TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
> # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
> only 1)
> I believe the lease should be recovered and the block should be marked 
> missing. However this is not happening. The lease is never recovered.
> The effect of this bug for us was that nodes could not be decommissioned 
> cleanly. Although we knew that the client had crashed, the Namenode never 
> released the leases (even after restarting the Namenode) (even months 
> afterwards). There are actually several other cases too where we don't 
> consider what happens if ALL the datanodes die while the file is being 
> written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to