[ https://issues.apache.org/jira/browse/HADOOP-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Chansler updated HADOOP-2259: ------------------------------------ Component/s: dfs > Replication should be decoupled from heartbeat > ---------------------------------------------- > > Key: HADOOP-2259 > URL: https://issues.apache.org/jira/browse/HADOOP-2259 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.15.0 > Environment: Hadoop 80 node cluster > Reporter: Srikanth Kakani > > I did a simple experiment of shooting down one node in the cluster and > measure the time taken to replicate the under-replicated blocks. > ~30000 blocks were under replicated == ~400 / node should take 200 minutes > to replicate completely given 1 minute heartbeat interval. > My findings: it took around 220 minutes, which is reasonable. > Bug: Replication is coupled with heartbeat. Heartbeat interval is based on > how much a namenode can handle. Repliaction should be based on how much a > datanode can handle. > So given the default heartbeat interval of 20 seconds, we computed datanodes > can handle 2 replications in that interval based on which Namenodes give 2 > blocks per heartbeat to replicate. > What we propose is to keep the 20second/2blocks constant and hence a datanode > coming in with a heartbeat of 1 minute interval should be given 6 blocks to > replicate per heartbeat. In this case instead on taking 200 minutes it should > take 200/3 ~1 hour to replicate the entire node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.