[
https://issues.apache.org/jira/browse/HDFS-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Foley updated HDFS-1687:
-----------------------------
Attachment: HDFS-1687_DirScan_v1.patch
test-patch results:
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.
-1 release audit. The applied patch generated 99 release audit warnings (more
than the trunk's current 98 warnings).
+1 system test framework. The patch passed system test framework compile.
Couldn't tell where the one "extra" release audit warning came from, as none of
the 99 affected files were files I had changed, and the only .java files warned
were in the "src/contrib/thriftfs" directories.
> HDFS Federation: DirectoryScanner changes for federation
> --------------------------------------------------------
>
> Key: HDFS-1687
> URL: https://issues.apache.org/jira/browse/HDFS-1687
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node
> Affects Versions: Federation Branch
> Reporter: Matt Foley
> Assignee: Matt Foley
> Fix For: Federation Branch
>
> Attachments: HDFS-1687_DirScan_v1.patch
>
>
> DirectoryScanner scans substantially all of the directory tree of entire
> volumes. It needs to be extended to work with Blockpools in Federation.
> Design notes:
> 1. The subdirectories of active bpid's will be scanned. Active bpid's are
> those associated with currently connected Namenodes. Each Volume knows the
> set of all active bpid's, via volume.map.keySet(). I'll add a
> package-private accessor in FSVolume to return the set of active bpid's for
> use by DirectoryScanner, DataBlockScanner, etc. DirectoryScanner will ignore
> inactive bpid's subdirectories; see item below.
> 2. There is no need to compare the volume set of active bpid's with the
> global set, because the way the code works, they really can't be different.
> If differences arise, they will be automatically fixed by the next restart of
> either the Datanode or the Namenode.
> 3. Inactive bpid's will be ignored. Until we are connected to the owner
> Namenode, we cannot know whether a bpid subdirectory is correctly formatted,
> has snapshot data, etc. So it doesn't make sense to try to manage the data
> under an inactive bpid.
> 4. DirectoryScanner is currently instantiated and periodically triggered by
> DataBlockScanner. Other than both being "scanners", these two modules have
> little in common, and the triggering code is confusing. (DirectoryScanner
> scans filesystem directory trees every hour, to detect and fix
> inconsistencies between disk directories and ReplicasMap. DataBlockScanner
> runs every 3 weeks, and traverses all block files, actually reading them out
> and checksumming them to detect block corruption.)
> Separating them, and running DirectoryScanner under its own periodic
> scheduler, is a small change that will make the code much clearer. It
> already runs on its own FixedThreadPool Executor, so it is easy to change it
> to a ScheduledThreadPool, and instantiate it from DataNode.postStartInit() at
> the same time as initBlockScanner() is called.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira