[
https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796535#comment-15796535
]
churro morales commented on HDFS-11218:
---------------------------------------
This seems quite useful. Are you guys working on this patch currently?
> Add option to skip open files during HDFS Snapshots
> ---------------------------------------------------
>
> Key: HDFS-11218
> URL: https://issues.apache.org/jira/browse/HDFS-11218
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: snapshots
> Affects Versions: 3.0.0-alpha1
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
>
> *Problem:*
> When there are files being written and when HDFS Snapshots are taken in
> parallel, Snapshots do capture all these files, but these being written
> files in Snapshots do not have the point-in-time file length captured.
> At the time of File close or any other meta data modification operation on
> that file which was previously open, HDFS reconciles the file length and
> records the modification in the last taken Snapshot. All the previously taken
> Snapshots continue to have the same open File with no modification recorded.
> So, all those previous snapshots end up using the final modification record
> in the next available snapshot.
> *Proposal:*
> HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M
> is the number file modifications. So, it would very expensive to record
> modifications for all the open files in all the snapshots. For applications
> that do not want to capture incomplete / partial being written binary files
> in the snapshots, it would be preferable to have an extra option to skip open
> files. This way, they don't have to worry about restoring inconsistent files
> from the snapshots.
> {noformat}
> hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]