RE: Replication problem of HDFS

Dhruba Borthakur Thu, 13 Sep 2007 15:55:48 -0700

This is expected behaviour. Since you have 4 datanodes, it might make sense
to bump up the replication factor to 2 or higher. Then you would see other
Datanodes getting filled up with blocks.


Thanks,
dhruba

-----Original Message-----
From: ChaoChun Liang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 12, 2007 11:12 PM
To: [email protected]
Subject: Re: Replication problem of HDFS


Thanks for your detail example and explanation.

The problem what I met is, all split blocks stored in the same datanode,
that is, (A1, A2, A3) stored in the same datanode in your example.

My test case is putting (by "hadoop fs -put" command) a file about 1GB to
HDFS
with 4 datanodes, where the namenode and datanode are in the same machine).
dfs.block.size=67108864 and dfs.replication=1 in the hadoop-site.xml

And uploading from the namenode(i.e A3 as following) machine. 
P.S. the namenode and datanode in the same machine, A3.

The datanode status "before" the uploading process is 
------------------------------------------------------------------------
Node    Last Contact    Admin State     Size (GB)       Used (%)
Blocks
A1      2               In Service      37.23           18.94           1
A2      2               In Service      36.06           19.30           1
A3      1               In Service      39.06           70.13           18
A4      1               In Service      39.06           18.52           1

The datanode status "after" the uploading process is 
------------------------------------------------------------------------
A1      2               In Service      37.23           18.94           1
A2      2               In Service      36.06           19.30           1
A3      1               In Service      39.06           71.95           35
A4      1               In Service      39.06           18.52           1

You can see that blocks increases only in the A3 node (by 17 blocks, from 18
to 35),
and the block numbers is others datanodes are the same and not changed. 

Is it look something wrong? Or it is the configuration problem. 

ChaoChun



Ted Dunning-3 wrote:
> 
> 
> Your question is very hard to understand.  The problem may be the names of
> the different kinds of server.
> 
> There is one namenode and there are many datanodes.
> 
> Each file is divided into one or more blocks.  By default the block has a
> maximum size of 64MB.  Each block from a file is stored on one or more
> datanodes.  The number of datanodes holding each block is called
> replication
> factor.  The namenode holds information about what blocks are in each
> file.
> The namenode also contains information about what blocks each datanode
> holds.
> 
> As an example, consider that you have 3 files called A, B, and C.  Each
> file
> is 150MB so they have two full size blocks (A1, A2, B1, B2, C1, C2) and
> one
> partial block that is 22MB in size (A3, B3, C3).
> 
> Suppose that replication factor is 1 for A, 2 for B and 3 for C.
> 
> One possible state of five datanodes is this:
> 
> Datanode1:
> A1, B2, C3, C1
> 
> Datanode2:
> A2, C2, B2
> 
> Datanode3:
> A3, C1, C3, B1
> 
> Datanode4:
> B1, C1, C2, B3
> 
> Datanode5:
> B3, C2, C3
> 
> The namenode would contain this information:
> 
> A -> (A1, A2, A3)
> B -> (B1, B2, B3)
> C -> (C1, C2, C3)
> 
> A1 -> (Datanode1)
> B1 -> (Datanode3, Datanode4)
> C1 -> (Datanode1, Datanode3, Datanode4)
>   ... And so on ...
> 
> Does that help?
> 
> On 9/10/07 8:04 PM, "ChaoChun Liang" <[EMAIL PROTECTED]> wrote:
> 
> 
>> 
>> In my application, whether M blocks(described as above) exist in the name
>> datanode(i.e. each database
>> owns a completed M block), or shared M blocks for datanodes in the HDFS
>> is
>> important for us.
>> 
>> If these M blocks could be shared, we may use the HDFS, otherswise we may
>> condiser the local file system
>> for the map/reduce processing. 
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a12649233
Sent from the Hadoop Users mailing list archive at Nabble.com.

RE: Replication problem of HDFS

Reply via email to