[
https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872327#comment-15872327
]
Manoj Govindassamy commented on HDFS-11218:
-------------------------------------------
[~churromorales],
I am currently working on HDFS-11402 - HDFS Snapshots should capture
point-in-time copies of OPEN files - which can also help alleviate problems
around HDFS Snapshots and open files. Please take a look.
> Add option to skip open files during HDFS Snapshots
> ---------------------------------------------------
>
> Key: HDFS-11218
> URL: https://issues.apache.org/jira/browse/HDFS-11218
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: snapshots
> Affects Versions: 3.0.0-alpha1
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
>
> *Problem:*
> When there are files being written and when HDFS Snapshots are taken in
> parallel, Snapshots do capture all these files, but these being written
> files in Snapshots do not have the point-in-time file length captured.
> At the time of File close or any other meta data modification operation on
> that file which was previously open, HDFS reconciles the file length and
> records the modification in the last taken Snapshot. All the previously taken
> Snapshots continue to have the same open File with no modification recorded.
> So, all those previous snapshots end up using the final modification record
> in the next available snapshot.
> *Proposal:*
> HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M
> is the number file modifications. So, it would very expensive to record
> modifications for all the open files in all the snapshots. For applications
> that do not want to capture incomplete / partial being written binary files
> in the snapshots, it would be preferable to have an extra option to skip open
> files. This way, they don't have to worry about restoring inconsistent files
> from the snapshots.
> {noformat}
> hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]