[
https://issues.apache.org/jira/browse/HDFS-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157764#comment-15157764
]
Esteban Gutierrez commented on HDFS-8564:
-----------------------------------------
I had patched Hadoop 2.6 with HDFS-8845 in order to avoid running into this
problem and I still noticed some IO spikes. I remember I even tuned
vm.vfs_cache_pressure to try to cache as much as possible all inode and
dentries but just the current directory traversal is too expensive when we
haven't cached that dir or if the cache has been evicted. For sure I can
confirm that HDFS-8845 has been helpful and the frequency of the spikes has
reduced but scanning up to 64k directories can be really IO intensive.
> BlockPoolSlice.checkDirs() will trigger excessive IO while traversing all
> sub-directories under finalizedDir
> ------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-8564
> URL: https://issues.apache.org/jira/browse/HDFS-8564
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0
> Reporter: Esteban Gutierrez
> Assignee: Esteban Gutierrez
> Priority: Critical
>
> DataNodes continuously call checkDiskErrorAsync() for multiple operations in
> the DN in order to verify if a volume hasn't experienced any failure. When
> DN.startCheckDiskErrorThread() is invoked we need to traverse all configured
> data volumes on a DN to see which volumes need to be removed (see
> FsVolumeList.checkDir(s)) however that means that for each directory on
> BlockPoolSlice we need to call DiskChecker.checkDirs() which will
> recursively will look into the rbw, tmp and finalized directories:
> {code}
> void checkDirs() throws DiskErrorException {
> DiskChecker.checkDirs(finalizedDir);
> DiskChecker.checkDir(tmpDir);
> DiskChecker.checkDir(rbwDir);
> }
> {code}
> Unfortunately after HDFS-6482, the subdirectory structure is created with the
> following algorithm:
> {code}
> public static File idToBlockDir(File root, long blockId) {
> int d1 = (int)((blockId >> 16) & 0xff);
> int d2 = (int)((blockId >> 8) & 0xff);
> String path = DataStorage.BLOCK_SUBDIR_PREFIX + d1 + SEP +
> DataStorage.BLOCK_SUBDIR_PREFIX + d2;
> return new File(root, path);
> }
> {code}
> Which leaves each data volume with 64K directories (256 directories x 256
> subdirectories) A side effect of this is that if the dentries haven't been
> cached by the OS, then the DN needs to recursively scan up to 64k directories
> x the number of configured data volumes (x number of files) impacting IO for
> other operations while DiskChecker.checkDirs(finalizedDir) is running.
> There are few possibilities to address this problem:
> 1. Do not scan at all finalizedDir
> 2. Limit to one level the number of sub directories to scan recursively. (256)
> 3. Remove a subdirectory immediately it doesn't have any block under it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)