[ 
https://issues.apache.org/jira/browse/HDFS-12191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137209#comment-16137209
 ] 

Manoj Govindassamy commented on HDFS-12191:
-------------------------------------------

Thanks for working on this [~yzhangal]. My comments below.

1. {{DFSConfigKeys}}
{noformat}
DFS_NAMENODE_DONT_CAPTURE_ACCESSTIME_ONLY_CHANGE_IN_SNAPSHOT = 
"dfs.namenode.dont.capture.accesstime.only.change.in.snapshot"
{noformat}
1.1 Any other better name for the above new config key? We have one other 
snapshot related config with the prefix "dfs.namenode.snapshot...". May be we 
should have all snapshot related one under the same prefix for grouping and 
consistency. How about "dfs.namenode.snapshot.skip.accesstime-only-diff. Your 
thoughts?
1.2 Config and the default value lines have checkstyle issue. can you please 
take care of this?

2. {{DirectoryWithSnapshotFeature}}
2.1 new blank line added to line 50 is not needed
2.2 
{noformat}
        } else {
          if (!dirCopy.metadataEquals(sdiff.snapshotINode)) {
            dirMetadataChanged = true;
          }
        }
{noformat}
Is this different compared to the existing "else if" block. Doesn't look so. 
Can be reverted.

3. {{hdfs-default.xml}}
3.1 Once the config key is changed, need to be incorporated here
3.2 "..it will not be captured in next snapshot." sounds ambiguous. The access 
time change history is not preserved. The file is still part of snapshot. Can 
we reword this? Also, there is typo in "lastest"

4. {{class INode}}
{noformat}
  private static boolean dontCaptureAccessTimeOnlyChangeInSnapshot = false;

  public static void setDontCaptureAccessTimeOnlyChangeInSnapshot(boolean s) {
    LOG.info("Setting dontCaptureAccessTimeOnlyChangeInSnapshot to " + s);
    dontCaptureAccessTimeOnlyChangeInSnapshot = s;
  }
{noformat}
4.1 Any other better ways of doing this instead of adding a static member to 
the core INode class. Snapshot manager can be the one to read all the 
configuration related to snapshots and it can take the decision accordingly 
instead of having the logic at the INode level

{noformat}
  public static boolean getDontCaptureAccessTimeOnlyChangeInSnapshot() {
    return dontCaptureAccessTimeOnlyChangeInSnapshot;
  }
{noformat}
4.2 Above method is never used.

5. {{FSDirAttrOp}}

{noformat}
  static boolean unprotectedSetTimes(
      FSDirectory fsd, INodesInPath iip, long mtime, long atime, boolean force)
          throws QuotaExceededException {
    ....
    if (mtime != -1) {
      inode = inode.setModificationTime(mtime, latest);
      status = true;
    }
    ...
    if (atime != -1 && (status || force
        || atime > inode.getAccessTime() + fsd.getAccessTimePrecision())) {
      inode.setAccessTime(atime, latest);
      status = true;
    }
}
{noformat}

>> "if there is other modification made to the file, the latest access
    time will be captured together with the modification in next snapshot."

Looks like accessTime will always be skipped with the config turned on. Please 
correct me if my understanding from the code is wrong.





> Provide option to not capture the accessTime change of a file to snapshot if 
> no other modification has been done
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12191
>                 URL: https://issues.apache.org/jira/browse/HDFS-12191
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>    Affects Versions: 3.0.0-beta1
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-12191.001.patch
>
>
> Currently, if the accessTime of a file changed before a snapshot is taken, 
> this accessTime will be captured in the snapshot, even if there is no other 
> modifications made to this file.
> Because of this, when we calculate snapshotDiff, more work need to be done 
> for this file, e,g,, metadataEquals method will be called, even if there is 
> no modification is made (thus not recorded to snapshotDiff). This can cause 
> snapshotDiff to slow down quite a lot when there are a lot of files to be 
> examined.
> This jira is to provide an option to skip capturing accessTime only change to 
> snapshot. Thus snapshotDiff can be done faster.
> When accessTime of a file changed, if there is other modification to the 
> file, the access time will still be captured in snapshot.
> Sometimes we want accessTime be captured to snapshot, such that when 
> restoring from the snapshot, we know the accessTime of this snapshot. So this 
> new feature is optional, and is controlled by a config property.
> Worth to mention is, how accurately the acessTime is captured is dependent on 
> the following config that has default value of 1 hour, which means new access 
> within an hour of previous access will not be captured.
> {code}
> public static final String  DFS_NAMENODE_ACCESSTIME_PRECISION_KEY =
>       
> HdfsClientConfigKeys.DeprecatedKeys.DFS_NAMENODE_ACCESSTIME_PRECISION_KEY;
> public static final long    DFS_NAMENODE_ACCESSTIME_PRECISION_DEFAULT = 
> 3600000;
> {code}
> .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to