It can take a long time to decide that a node is down. If that down node has the last copy of a file, then it won't get replicated.
I run a balancing script every few hours. It wanders through the files and ups the replication of each file temporarily. This is important because initial allocations of blocks isn't done as well as increased allocations. It also causes the system to respond sooner to low replication count files ... if a datanode is down, then the remaining nodes will respond to the increased replication count and the down node won't respond to requests to delete the block. This results in a desirable improvement in replication for those nearly orphaned blocks. On 1/4/08 1:02 PM, "Raghu Angadi" <[EMAIL PROTECTED]> wrote: > This is of course not expected. A more detailed info or log message > would help. Do you know if there is at least one good block? Sometimes, > the remaining "good" block might actually be corrupted and thus can not > replicate itself. Restarting might just have brought up the datanodes > that were down (for whatever reason) before the restart. > > Raghu. > > Chris Kline wrote: >> fsck reports several under replicated blocks, but these do not get fixed >> until I restart DFS. fsck also reports a missing block at the same >> time, but this should affect the function of fixing under replicated >> blocks. Has anyone seen this before? >> >> I'm running 0.15.0. >> >> -Chris Kline >