Hey hdfs gurus - One of my clusters is going through disk upgrades and not all machines have a homogenous disk layout during the transition period. At first I started looking into auto-generating dfs.data.dir based on the current machine profile, but then looked at how disks are actually made available to the datanode.
Looking at makeInstance() we see each disk listed in dfs.data.dir is tested, and if usable, added as a disk to use. If there are disks to use a new datanode is started with the usable disks. Does it seem reasonable to push a config to all hosts with the new number of disks? As machines are upgraded and mount points exist the disks will be used. Machines not yet upgraded will simply ignore the missing directories (the datanode will not have permissions to create the missing dirs). public static DataNode makeInstance(String[] dataDirs, Configuration conf) throws IOException { ArrayList<File> dirs = new ArrayList<File>(); for (int i = 0; i < dataDirs.length; i++) { File data = new File(dataDirs[i]); try { DiskChecker.checkDir(data); dirs.add(data); } catch(DiskErrorException e) { LOG.warn("Invalid directory in dfs.data.dir: " + e.getMessage()); } } if (dirs.size() > 0) return new DataNode(conf, dirs); LOG.error("All directories in dfs.data.dir are invalid."); return null; } Thoughts? --travis