Oh. Thanks for the reply. Regards, Krishna On Jul 13, 2010, at 9:51 AM, Allen Wittenauer wrote:
> > When you write on a machine running a datanode process, the data is *always* > written locally first. This is to provide an optimization to the MapReduce > framework. The lesson here is that you should *never* use a datanode > machine to load your data. Always do it outside the grid. > > Additionally, you can use fsck (filename) -files -locations -blocks to see > where those blocks have been written. > > On Jul 13, 2010, at 9:45 AM, Nathan Grice wrote: > >> To test the block distribution, run the same put command from the NameNode >> and then again from the DataNode. >> Check the HDFS filesystem after both commands. In my case, a 2GB file was >> distributed mostly evenly across the datanodes when put was run on the >> NameNode, and then put only on the DataNode where I ran the put command >> >> On Tue, Jul 13, 2010 at 9:32 AM, C.V.Krishnakumar >> <[email protected]>wrote: >> >>> Hi, >>> I am a newbie. I am curious to know how you discovered that all the blocks >>> are written to datanode's hdfs? I thought the replication by namenode was >>> transparent. Am I missing something? >>> Thanks, >>> Krishna >>> On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote: >>> >>>> We are trying to load data into hdfs from one of the slaves and when the >>> put >>>> command is run from a slave(datanode) all of the blocks are written to >>> the >>>> datanode's hdfs, and not distributed to all of the nodes in the cluster. >>> It >>>> does not seem to matter what destination format we use ( /filename vs >>>> hdfs://master:9000/filename) it always behaves the same. >>>> Conversely, running the same command from the namenode distributes the >>> files >>>> across the datanodes. >>>> >>>> Is there something I am missing? >>>> >>>> -Nathan >>> >>> >
