Re: The problem about block size

bejoy . hadoop Wed, 10 Aug 2011 01:56:10 -0700

If 64 mb is your hdfs block size then the 50 mb file won't be splitted, would 
be stored in a single block in hdfs. AFAIK which data node or rather which all 
data nodes is decided by the name node. The block would be replicated and 
stored, in default the replication factor is 3. So in your case the block 
should be there in both of your data nodes.
When you use map reduce to process the file the number of mappers is decided by 
the number of input splits. In most clusters the mapreduce input split size 
would be same as hdfs block size. So in your case just one mapper would be 
triggered.
 
Regards
Bejoy K S

-----Original Message-----
From: 程笑 <xiaocheng...@163.com>
Date: Wed, 10 Aug 2011 16:33:18 
To: <hdfs-user@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: The problem about block size

Hi,
I have established a Hadoop cluster with one NameNode and two DataNodes.
Now I have a question about block size. I site the block size for 64MB.I stor 
one text file (50MB) on the HDFS. Whether this text file  is splited? If not, 
the text file stor on which DataNodes? I user MapReduce to comput the text 
file, how many maps will process this file?
Please forgive my poor English.
Best regards! 
Cheng Xiao

Re: The problem about block size

Reply via email to