RE: when will the feature HADOOP-1700 be implemented ?

dhruba Borthakur Mon, 10 Dec 2007 22:48:48 -0800

Hello Mafish,

My aplogies for this late reponse. Your email was somehow got hidden in
my mailbox!

1. The Namenode allocates a new GenerationStamp and gives it to the
client. The Namenode stores this GenerationStamp into its transaction
log but it is not yet associated with any particular block. The client
applies this stamp to all the Datanodes and then informs the Namenode.
The Namenode then associates that stamp with the block and stores it
persistently in the BlocksMap. Thus, it is possible that the Datanode
may have a block generation stamp that is larger than what is stored in
the BlockMap.

2. If a client was writing data to three Datanodes in the pipeline. Now
suppose that two clients received some data and wrote it to their disk
while the third Datanode died before writing it. Now, suppose the client
died too. The Namenode will then try to do Lease Recovery for this file.
It will find that the block size on two Datanodes are same but it is
different on the third Datanode. Thus, "size of a replica on different
Datanodes could be different".

Let me know if my explanation helps,
Thanks,
dhruba

-----Original Message-----
From: Mafish Liu [mailto:[EMAIL PROTECTED] 
Sent: Saturday, November 17, 2007 10:51 PM
To: [email protected]
Subject: Re: when will the feature HADOOP-1700 be implemented ?

Hi, Dhruba
  I got some questions while reading your design.
  1. In section "Lease recovery", you said "Any Datanode that has a
BlockGenerationStamp that is larger than what is stored in the BlocksMap
is
guaranteed to contain data from the last successful write to that
block." To
my understanding, BlockMap, which is stored in NameNode, always records
the
most recently mapping from file name to file blocks. When will the case
that
DataNode has a larger BlockGenerationStamp than that of in NameNode
happen?
  2. You said "The block size of each of these replicas could still be
different because the write from a client might not have been committed
to
all
replicas." To my understanding, before the client sends "CloseFile"
request
to
NameNode, it must ensure that all of the replicas are written correctly.

On Nov 16, 2007 2:44 PM, dhruba Borthakur <[EMAIL PROTECTED]> wrote:

> Actually, here is a more recent copy of the document:
>
> http://issues.apache.org/jira/secure/attachment/12369639/Appends.html
>
> thanks,
> dhruba
>
> -----Original Message-----
> From: dhruba Borthakur [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 15, 2007 10:10 PM
> To: [email protected]
> Subject: RE: when will the feature HADOOP-1700 be implemented ?
>
> Hi Mafish,
>
> The design document is attached as a file Appends.doc (or
Appends.html)
> to the HADOOP-1700 bug description. Here is a direct link:
>
> http://issues.apache.org/jira/secure/attachment/12367999/Appends.htm
>
> If you are not able to access this link, please let me know and I will
> email you a copy of the document.
>
> This design is currently under review. I am waiting for review
comments.
>
> Thanks,
> dhruba
>
>
> -----Original Message-----
> From: Mafish Liu [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 15, 2007 5:54 PM
> To: [email protected]
> Subject: Re: when will the feature HADOOP-1700 be implemented ?
>
> Hello, Dhruba
>    Would you please give me a link where your design stands?
> --
> [EMAIL PROTECTED]
> Institute of Computing Technology, Chinese Academy of Sciences,
Beijing,
> China.
> Tel: 86 10 62601317
>

-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.

RE: when will the feature HADOOP-1700 be implemented ?

Reply via email to