Manoj Govindassamy created HDFS-11218:
-----------------------------------------
Summary: Add option to skip open files during HDFS Snapshots
Key: HDFS-11218
URL: https://issues.apache.org/jira/browse/HDFS-11218
Project: Hadoop HDFS
Issue Type: Improvement
Components: snapshots
Affects Versions: 3.0.0-alpha1
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
Problem:
When there are files being written and when HDFS Snapshots are taken in
parallel, Snapshots do capture all these files, but these being written files
in Snapshots do not have the point-in-time file length captured.
At the time of File close or any other meta data modification operation on that
file which was previously open, HDFS reconciles the file length and records the
modification in the last taken Snapshot. All the previously taken Snapshots
continue to have the same open File with no modification recorded. So, all
those previous snapshots end up using the final modification record in the next
available snapshot.
Proposal:
HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M
is the number file modifications. So, it would very expensive to record
modifications for all the open files in all the snapshots. For applications
that do not want to capture incomplete / partial being written binary files in
the snapshots, it would be preferable to have an extra option to skip open
files. This way, they don't have to worry about restoring inconsistent files
from the snapshots.
{noformat}
hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]