I have used the balancer to balance the data in the cluster with the -threshold option. The bandwidth transfer was set to 1MB/sec ( I think thats the default setting) in one of the config files and had to move 500GB of data around. It did take sometime but eventually the data got spread out evenly. In my case i was using one of the machines as the masternode and datanode at the same time which is why this one machine consumed more as compared to the other datanodes.

Thanks,
Usman


Hey Alex,

Will Hadoop balancer utility work in this case?

Pankil

On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard <a...@cloudera.com> wrote:

Are you seeing any exceptions because of the disk being at 99% capacity?

Hadoop should do something sane here and write new data to the disk with
more capacity. That said, it is ideal to be balanced. As far as I know,
there is no way to balance an individual DataNode's hard drives (Hadoop
does
round-robin scheduling when writing data).

Alex

On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo <kjirapi...@biz360.com
>wrote:

> Hi all,
>    How does one handle a mount running out of space for HDFS?  We have
two
> disks mounted on /mnt and /mnt2 respectively on one of the machines that
> are
> used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way
to
> tell the machine to balance itself out? I know for the cluster, you can
> balance it using start-balancer.sh but I don't think that it will tell
the
> individual machine to balance itself out. Our "hack" right now would be
> just to delete the data on /mnt, since we have replication of 3x, we
should
> be OK.  But I'd prefer not to do that.  Any thoughts?
>




--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Reply via email to