[
https://issues.apache.org/jira/browse/HDFS-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Ivanov updated HDFS-9841:
------------------------------
Affects Version/s: (was: 2.4.0)
(was: 2.3.0)
2.5.0
> FileDiff's skipped by hdfs snapshotDiff
> ---------------------------------------
>
> Key: HDFS-9841
> URL: https://issues.apache.org/jira/browse/HDFS-9841
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 2.5.0
> Reporter: Alex Ivanov
>
> Summary
> When a file in HDFS is read, its corresponding inode's accessTime field is
> updated. If the file is present in the last snapshot, the accessTime change
> causes a FileDiff to be added to the SnapshotDiff of the last snapshot.
> This behavior has the following problems:
> - Since FileDiff's reside in memory on the namenodes, snapshots become
> progressively more memory-heavy with increasing volume of data in hdfs. On a
> system with frequent updates, e.g. hourly, this becomes a big problem since
> for, say 2000 snapshots, one can have 2000 FileDiff's per file pointing to
> the same inode.
> - FSImage grows in size tremendously, and upload operation from standby to
> active namenode takes much longer.
> -The generated FileDiff does not contain any useful information that I can
> see. Since all FileDiff's for that file are pointing to the same inode, the
> accessTime they see is the same.-
> - I was wrong about the last point. Each FileDiff includes a SnapshotCopy
> attribute, which contains the updated accessTime. This may be a feature, but
> I'd question the value of having it enabled by default.
> Configuration:
> CDH 5.0.5 (Hadoop 2.3 / 2.4)
> We are NOT overwriting the default parameter:
> DFS_NAMENODE_ACCESSTIME_PRECISION_DEFAULT = 3600000;
> Note that it determines the allowed frequency of accessTime field updates -
> every hour by default.
> How to reproduce:
> {code}
> [root@node1076]# hdfs dfs -ls /data/tenants/testenv.testtenant/wddata
> Found 3 items
> drwxr-xr-x - hdfs hadoop 0 2015-10-04 10:52
> /data/tenants/testenv.testtenant/wddata/folder1
> -rw-r--r-- 3 hdfs hadoop 38 2015-10-05 03:13
> /data/tenants/testenv.testtenant/wddata/testfile1
> -rw-r--r-- 3 hdfs hadoop 21 2015-10-04 10:45
> /data/tenants/testenv.testtenant/wddata/testfile2
> [root@node1076]# hdfs dfs -ls
> /data/tenants/testenv.testtenant/wddata/.snapshot
> Found 8 items
> drwxr-xr-x - hdfs hadoop 0 2015-10-04 10:47
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn1
> drwxr-xr-x - hdfs hadoop 0 2015-10-04 10:47
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn2
> drwxr-xr-x - hdfs hadoop 0 2015-10-04 10:52
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn3
> drwxr-xr-x - hdfs hadoop 0 2015-10-04 10:53
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn4
> drwxr-xr-x - hdfs hadoop 0 2015-10-04 10:57
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn5
> drwxr-xr-x - hdfs hadoop 0 2015-10-04 10:58
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn6
> drwxr-xr-x - hdfs hadoop 0 2015-10-05 03:13
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn7
> drwxr-xr-x - hdfs hadoop 0 2015-10-05 04:20
> /data/tenants/testenv.testtenant/wddata/.snapshot/sn8
> [root@node1076]# hdfs dfs -createSnapshot
> /data/tenants/testenv.testtenant/wddata sn9
> Created snapshot /data/tenants/testenv.testtenant/wddata/.snapshot/sn9
> [root@node1076]# hdfs snapshotDiff /data/tenants/testenv.testtenant/wddata
> sn8 sn9
> Difference between snapshot sn8 and snapshot sn9 under directory
> /data/tenants/testenv.testtenant/wddata:
> ################
> ## IMPORTANT: testfile1 was put into HDFS more than 1 hour ago, which
> triggers the accessTime update.
> ################
> [root@node1076]# hdfs dfs -cat
> /data/tenants/testenv.testtenant/wddata/testfile1
> This is test file 1, but now it's 11.
> [root@node1076]# hdfs dfs -createSnapshot
> /data/tenants/testenv.testtenant/wddata sn10
> Created snapshot /data/tenants/testenv.testtenant/wddata/.snapshot/sn10
> [root@node1076]# hdfs snapshotDiff /data/tenants/testenv.testtenant/wddata
> sn9 sn10
> Difference between snapshot sn9 and snapshot sn10 under directory
> /data/tenants/testenv.testtenant/wddata:
> M ./testfile1
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)