On Thu, Jun 18, 2009 at 3:43 PM, rajeev gupta <graj1...@yahoo.com> wrote:

>
> I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS
> cluster and replication factor is 1. A large file is there on one of those
> three cluster machines in its local file system. If I put that file in HDFS
> will it be divided and distributed across all three machines? I had this
> doubt as HDFS "moving computation is cheaper than moving data".
>
> If file is distributed across all three machines, lots of data transfer
> will be there, whereas, if file is NOT distributed then compute power of
> other machine will be unused. Am I missing something here?
>
> -Raj
>
>
>
Irrespective of what you set as the replication factor, large files will
always be split into chunks (chunk size is what you set as your HDFS
block-size) and they'll be distributed across your entire cluster.


-- 
Harish Mallipeddi
http://blog.poundbang.in

Reply via email to