Hi Karthiek, I haven't checked 1.0.4, but in 2.2.0 and onwards, there's this setting you can tweak up:
dfs.datanode.balance.bandwidthPerSec By default, it's set to just 1MB/s, which is pretty slow. Again at least in 2.2.0, there's also `hdfs dfsadmin -setBalancerBandwidth` which can be used to adjust this config property at runtime. Best, Andrew On Wed, Dec 18, 2013 at 2:40 PM, Karthiek C <karthi...@gmail.com> wrote: > Hi all, > > I am working on a research project where we are looking at algorithms to > "optimally" distribute data blocks in HDFS nodes. The definition of what is > optimal is omitted for brevity. > > I want to move specific blocks of a file that is *already* in HDFS. I am > able to achieve it using data transfer protocol (took cues from "Balancer" > module). But the operation turns out to be very time consuming. In my > cluster setup, to move 1 block of data (approximately 60 MB) from > data-node-1 to data-node-2 it takes nearly 60 seconds. A "dfs -put" > operation that copies the same file from data-node-1's local file system to > data-node-2 takes just 1.4 seconds. > > Any suggestions on how to speed up the movement of specific blocks? > Bringing down the running time is very important for us because this > operation may happen while executing a job. > > I am using hadoop-1.0.4 version. > > Thanks in advance! > > Best, > Karthiek >