[jira] [Commented] (HDFS-6801) Archival Storage: Add a new data migration tool

Jing Zhao (JIRA) Thu, 14 Aug 2014 13:59:31 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097643#comment-14097643
 ]


Jing Zhao commented on HDFS-6801:
---------------------------------

Some quick comments:
# Currently Mover always scan the whole namespace. Maybe we should allow users 
to specify a list of paths for migration. This will also be useful in a shared 
cluster.
# Currently the Mover will go through the whole namespace and finishes all the 
check/schedule work before starting the real migration work in dispatcher. 
Going through the whole namespace may take a lot of time, thus maybe here we 
should start the dispatching work once there is some work that has been 
scheduled? But we can do this in a separate jira as optimization.
# For a path ending with ".snapshot" (e.g., /foo/.snapshot/), {{getFileInfo}} 
can only return a fake HdfsFileStatus. We may need to call {{getListing}} to 
get all the snapshots under the snapshottable directory.
{code}
if (snapshottableDirs != null && snapshottableDirs.contains(dir)) {
  final String snapshotPath = dir + HdfsConstants.DOT_SNAPSHOT_DIR;
  try {
    final HdfsFileStatus snapshotFileInfo = dfs.getFileInfo(snapshotPath);
    processDirRecursively(snapshotPath, snapshotFileInfo);
{code}
# A file can be included in both the current fs directory and snapshots. Looks 
like the current patch will schedule this kind of file multiple times since we 
process both the snapshot paths and the normal paths? Will that cause any 
conflicts? We may want to only do extra processing for files that have been 
deleted and only exist in snapshots.

> Archival Storage: Add a new data migration tool 
> ------------------------------------------------
>
>                 Key: HDFS-6801
>                 URL: https://issues.apache.org/jira/browse/HDFS-6801
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer, namenode
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h6801_20140813.patch, h6801_20140814.patch, 
> h6801_20140814b.patch
>
>
> The tool is similar to Balancer.  It periodic scans the blocks in HDFS and 
> uses path and/or other meta data (e.g. mtime) to determine if a block should 
> be cooled down (i.e. hot => warm, or warm => cold) or warmed up (i.e. cold => 
> warm, or warm => hot).  In contrast to Balancer, the migration tool always 
> move replicas to a different storage type.  Similar to Balancer, the replicas 
> are moved in a way that the number of racks the block does not decrease.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6801) Archival Storage: Add a new data migration tool

Reply via email to