It could be that your process has hung cause a particular resident
block (file) requires a very large replication factor, and your
remaining # of nodes is less than that value. This is a genuine reason
for hang (but must be fixed). The process usually waits until there
are no under-replicated blocks, so I'd use fsck to check if any such
ones are present and setrep them to a lower value.

On Fri, Aug 12, 2011 at 9:28 PM,  <[email protected]> wrote:
> Hi All,
>
> I'm trying to decommission data node from my cluster.  I put the data node in 
> the /usr/lib/hadoop/conf/dfs.hosts.exclude list and restarted the name nodes. 
>  The under-replicated blocks are starting to replicate, but it's going down 
> in a very slow pace.  For 1 TB of data it takes over 1 day to complete.   We 
> change the settings as below and try to increase the replication rate.
>
> Added this to hdfs-site.xml on all the nodes on the cluster and restarted the 
> data nodes and name node processes.
> <property>
>  <!-- 100Mbit/s -->
>  <name>dfs.balance.bandwidthPerSec</name>
>  <value>131072000</value>
> </property>
>
> Speed didn't seem to pick up. Do you know what may be happening?
>
> Thanks!
> Jonathan
>
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise private information.  If you have received it in 
> error, please notify the sender immediately and delete the original.  Any 
> other use of the email by you is prohibited.
>



-- 
Harsh J

Reply via email to