[ https://issues.apache.org/jira/browse/HDFS-12191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137209#comment-16137209 ]
Manoj Govindassamy commented on HDFS-12191: ------------------------------------------- Thanks for working on this [~yzhangal]. My comments below. 1. {{DFSConfigKeys}} {noformat} DFS_NAMENODE_DONT_CAPTURE_ACCESSTIME_ONLY_CHANGE_IN_SNAPSHOT = "dfs.namenode.dont.capture.accesstime.only.change.in.snapshot" {noformat} 1.1 Any other better name for the above new config key? We have one other snapshot related config with the prefix "dfs.namenode.snapshot...". May be we should have all snapshot related one under the same prefix for grouping and consistency. How about "dfs.namenode.snapshot.skip.accesstime-only-diff. Your thoughts? 1.2 Config and the default value lines have checkstyle issue. can you please take care of this? 2. {{DirectoryWithSnapshotFeature}} 2.1 new blank line added to line 50 is not needed 2.2 {noformat} } else { if (!dirCopy.metadataEquals(sdiff.snapshotINode)) { dirMetadataChanged = true; } } {noformat} Is this different compared to the existing "else if" block. Doesn't look so. Can be reverted. 3. {{hdfs-default.xml}} 3.1 Once the config key is changed, need to be incorporated here 3.2 "..it will not be captured in next snapshot." sounds ambiguous. The access time change history is not preserved. The file is still part of snapshot. Can we reword this? Also, there is typo in "lastest" 4. {{class INode}} {noformat} private static boolean dontCaptureAccessTimeOnlyChangeInSnapshot = false; public static void setDontCaptureAccessTimeOnlyChangeInSnapshot(boolean s) { LOG.info("Setting dontCaptureAccessTimeOnlyChangeInSnapshot to " + s); dontCaptureAccessTimeOnlyChangeInSnapshot = s; } {noformat} 4.1 Any other better ways of doing this instead of adding a static member to the core INode class. Snapshot manager can be the one to read all the configuration related to snapshots and it can take the decision accordingly instead of having the logic at the INode level {noformat} public static boolean getDontCaptureAccessTimeOnlyChangeInSnapshot() { return dontCaptureAccessTimeOnlyChangeInSnapshot; } {noformat} 4.2 Above method is never used. 5. {{FSDirAttrOp}} {noformat} static boolean unprotectedSetTimes( FSDirectory fsd, INodesInPath iip, long mtime, long atime, boolean force) throws QuotaExceededException { .... if (mtime != -1) { inode = inode.setModificationTime(mtime, latest); status = true; } ... if (atime != -1 && (status || force || atime > inode.getAccessTime() + fsd.getAccessTimePrecision())) { inode.setAccessTime(atime, latest); status = true; } } {noformat} >> "if there is other modification made to the file, the latest access time will be captured together with the modification in next snapshot." Looks like accessTime will always be skipped with the config turned on. Please correct me if my understanding from the code is wrong. > Provide option to not capture the accessTime change of a file to snapshot if > no other modification has been done > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-12191 > URL: https://issues.apache.org/jira/browse/HDFS-12191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode > Affects Versions: 3.0.0-beta1 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-12191.001.patch > > > Currently, if the accessTime of a file changed before a snapshot is taken, > this accessTime will be captured in the snapshot, even if there is no other > modifications made to this file. > Because of this, when we calculate snapshotDiff, more work need to be done > for this file, e,g,, metadataEquals method will be called, even if there is > no modification is made (thus not recorded to snapshotDiff). This can cause > snapshotDiff to slow down quite a lot when there are a lot of files to be > examined. > This jira is to provide an option to skip capturing accessTime only change to > snapshot. Thus snapshotDiff can be done faster. > When accessTime of a file changed, if there is other modification to the > file, the access time will still be captured in snapshot. > Sometimes we want accessTime be captured to snapshot, such that when > restoring from the snapshot, we know the accessTime of this snapshot. So this > new feature is optional, and is controlled by a config property. > Worth to mention is, how accurately the acessTime is captured is dependent on > the following config that has default value of 1 hour, which means new access > within an hour of previous access will not be captured. > {code} > public static final String DFS_NAMENODE_ACCESSTIME_PRECISION_KEY = > > HdfsClientConfigKeys.DeprecatedKeys.DFS_NAMENODE_ACCESSTIME_PRECISION_KEY; > public static final long DFS_NAMENODE_ACCESSTIME_PRECISION_DEFAULT = > 3600000; > {code} > . -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org