[
https://issues.apache.org/jira/browse/HDFS-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097643#comment-14097643
]
Jing Zhao commented on HDFS-6801:
---------------------------------
Some quick comments:
# Currently Mover always scan the whole namespace. Maybe we should allow users
to specify a list of paths for migration. This will also be useful in a shared
cluster.
# Currently the Mover will go through the whole namespace and finishes all the
check/schedule work before starting the real migration work in dispatcher.
Going through the whole namespace may take a lot of time, thus maybe here we
should start the dispatching work once there is some work that has been
scheduled? But we can do this in a separate jira as optimization.
# For a path ending with ".snapshot" (e.g., /foo/.snapshot/), {{getFileInfo}}
can only return a fake HdfsFileStatus. We may need to call {{getListing}} to
get all the snapshots under the snapshottable directory.
{code}
if (snapshottableDirs != null && snapshottableDirs.contains(dir)) {
final String snapshotPath = dir + HdfsConstants.DOT_SNAPSHOT_DIR;
try {
final HdfsFileStatus snapshotFileInfo = dfs.getFileInfo(snapshotPath);
processDirRecursively(snapshotPath, snapshotFileInfo);
{code}
# A file can be included in both the current fs directory and snapshots. Looks
like the current patch will schedule this kind of file multiple times since we
process both the snapshot paths and the normal paths? Will that cause any
conflicts? We may want to only do extra processing for files that have been
deleted and only exist in snapshots.
> Archival Storage: Add a new data migration tool
> ------------------------------------------------
>
> Key: HDFS-6801
> URL: https://issues.apache.org/jira/browse/HDFS-6801
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: balancer, namenode
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Tsz Wo Nicholas Sze
> Attachments: h6801_20140813.patch, h6801_20140814.patch,
> h6801_20140814b.patch
>
>
> The tool is similar to Balancer. It periodic scans the blocks in HDFS and
> uses path and/or other meta data (e.g. mtime) to determine if a block should
> be cooled down (i.e. hot => warm, or warm => cold) or warmed up (i.e. cold =>
> warm, or warm => hot). In contrast to Balancer, the migration tool always
> move replicas to a different storage type. Similar to Balancer, the replicas
> are moved in a way that the number of racks the block does not decrease.
--
This message was sent by Atlassian JIRA
(v6.2#6252)