Is there a way to get HDFS (Haodop 0.19.1) to automatically shuffle around blocks to fix misreplicated blocks due to the redefinition of which datanodes are in which racks?
When we first set up our HDFS cluster, we had 64 nodes in a single rack. Since we had only one rack, we left topology.script.file.name empty. Now we have 3 racks have have set topology.script.file.name to our own script defining which node is in which rack. However, this now means that every file gets reported as having misreplicated blocks with 'hadoop fsck /'. I've tried restarting the namenode and running the balancer, but neither one fixed any of the misreplicated blocks when I checked again the next morning. We have 100TB of data in HDFS, so fixing this manually is not an option. :) --Mike
