[
https://issues.apache.org/jira/browse/HDFS-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872315#comment-15872315
]
Manoj Govindassamy commented on HDFS-11220:
-------------------------------------------
HDFS-11402 - HDFS Snapshots should capture point-in-time copies of OPEN files
can help solve this issue as well. Will add more tests and cases as part of
this bug once HDFS-11402 is resolved.
> SnapshotDiffReport should detect open files in HDFS Snapshots
> -------------------------------------------------------------
>
> Key: HDFS-11220
> URL: https://issues.apache.org/jira/browse/HDFS-11220
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: snapshots
> Affects Versions: 3.0.0-alpha1
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in
> parallel, Snapshots do capture all these files, but these being written files
> in Snapshots do not have the point-in-time file length captured. Most of the
> times, these open files will have a length of 0, or the last block boundary
> size.
> 2. Only at the time of File close or any other meta data modification
> operation on these files, HDFS reconciles the file length and records the
> modification in the last taken Snapshot. All the previously taken Snapshots
> continue to have those open Files with no modification recorded. So, all
> those previous snapshots end up using the final modification record in the
> next available snapshot. So, after the file close, file lengths in all those
> snapshots will end up same.
> Assume File1 is opened for write and a total of 1MB written to it. While the
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> So, Snapshot Diff Report running against any of above snapshots will not
> detect any delta changes in the open files.
> *Proposal:*
> 1. HDFS Snapshots can stash open file details in the snapshot record.
> 2. NameNode might not have the accurate byte level length visibility on the
> open files, Snapshots might not have the accurate point-in-time length
> captured. So, SnapshotDiffReport can have an option to detect open files and
> always show {{M}} flag for the open files, if the files are available on both
> the snapshots it is running against with.
> {noformat}
> hdfs snapshotDiff -includeOpenFiles <snapDir> <snapName> <snapName>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]