Esteban Gutierrez created HDFS-8564:
---------------------------------------
Summary: BlockPoolSlice.checkDirs() will trigger excessive IO
while traversing all sub-directories under finalizedDir
Key: HDFS-8564
URL: https://issues.apache.org/jira/browse/HDFS-8564
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode, HDFS
Affects Versions: 3.0.0
Reporter: Esteban Gutierrez
Priority: Critical
DataNodes continuously call checkDiskErrorAsync() for multiple operations in
the DN in order to verify if a volume hasn't experienced any failure. When
DN.startCheckDiskErrorThread() is invoked we need to traverse all configured
data volumes on a DN to see which volumes need to be removed (see
FsVolumeList.checkDir(s)) however that means that for each directory on
BlockPoolSlice we need to call DiskChecker.checkDirs() which will recursively
will look into the rbw, tmp and finalized directories:
{code}
void checkDirs() throws DiskErrorException {
DiskChecker.checkDirs(finalizedDir);
DiskChecker.checkDir(tmpDir);
DiskChecker.checkDir(rbwDir);
}
{code}
Unfortunately after HDFS-6482, the subdirectory structure is created with the
following algorithm:
{code}
public static File idToBlockDir(File root, long blockId) {
int d1 = (int)((blockId >> 16) & 0xff);
int d2 = (int)((blockId >> 8) & 0xff);
String path = DataStorage.BLOCK_SUBDIR_PREFIX + d1 + SEP +
DataStorage.BLOCK_SUBDIR_PREFIX + d2;
return new File(root, path);
}
{code}
Which leaves each data volume with 64K directories (256 directories x 256
subdirectories) A side effect of this is that if the dentries haven't been
cached by the OS, then the DN needs to recursively scan up to 64k directories x
the number of configured data volumes (x number of files) impacting IO for
other operations while DiskChecker.checkDirs(finalizedDir) is running.
There are few possibilities to address this problem:
1. Do not scan at all finalizedDir
2. Limit to one level the number of sub directories to scan recursively. (256)
3. Remove a subdirectory immediately it doesn't have any block under it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)