On Thu, Jun 18, 2009 at 3:43 PM, rajeev gupta <graj1...@yahoo.com> wrote:
> > I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS > cluster and replication factor is 1. A large file is there on one of those > three cluster machines in its local file system. If I put that file in HDFS > will it be divided and distributed across all three machines? I had this > doubt as HDFS "moving computation is cheaper than moving data". > > If file is distributed across all three machines, lots of data transfer > will be there, whereas, if file is NOT distributed then compute power of > other machine will be unused. Am I missing something here? > > -Raj > > > Irrespective of what you set as the replication factor, large files will always be split into chunks (chunk size is what you set as your HDFS block-size) and they'll be distributed across your entire cluster. -- Harish Mallipeddi http://blog.poundbang.in