Hello, I have a question about scaling down an HDFS.
Say I have two datanodes that each maintain datadirs in /datadir Given that I want to get rid of datanode2, is there any way for me to have datanode1 adopt control of datanode2's datadir without decommissioning or using replications? I ask this because I want to run MapReduce jobs on a Qsub cluster where I ask for some number of nodes for a given amount of time and those nodes will disappear at the end of the run (but their filesystems remain visible to me through NFS). After the nodes die, I will probably not be able to get those same nodes back if I ask to run another job, so I would need to continue to perform operations on the 200 HDFS datadirs with some smaller set of workers. It feels silly of me to run HDFS on top of an NFS setup, but I can take the performance hit, and it would make my code a lot cleaner if I didn't have to switch filesystems between computer types (clouds and qsub clusters). Is there any way of doing this? I tried the obvious (building a 2 datanode setup, then reconfiguring with a one node setup with dfs.data.dir that included both original datadir's) and it didn't seem to work. Sorry if this gets asked a lot or if I'm looking over something obvious. Thanks Ben