On 05/08/10 19:28, Raj V wrote:
Thank you. I realized that I was running the datanode on the namenode and
stopped it, but did not know that the first copy went to the local node.

Raj

It's a placement decision that makes sense for code running as MR jobs, ensuring that the output of work goes to the local machine and not somewhere random, but on big imports like your's you get penalised.

Some datacentres have one or two IO nodes in the cluster that aren't running hadoop HDFS or task trackers, but let you get at the data at full datacentre rates, just to help with these kind of problems. Otherwies, if you can implement your import as a MapReduce job, Hadoop can do the work for you

-steve

Reply via email to