[ 
https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872785#comment-15872785
 ] 

Manoj Govindassamy commented on HDFS-11402:
-------------------------------------------

Thanks for the design review [~jingzhao]. Much appreciated. 

bq. I think the key challenge here is how to let NN know the lengths of open 
files...If we choose to record l_n in the snapshot, then later we may have risk 
to lose data (from client's point of view).

We have the same problem with parallel readers when there is an ongoing write. 
Readers reading the file will have no clue on the latest length or data. But 
there is always {{SyncFlag.UPDATE_LENGTH}} and applications wanting to update 
NN about their latest writes can do {{hsync()}} with the SyncFlag option. This 
same existing method is helpful for Snapshots also, as in the above case the 
open file length captured in the snapshot would be the last hsync() length.

Otherwise, applications which are attempting to ensure consistent lengths of 
files in HDFS snapshots will not be able to reliably do so. IMHO, HDFS Snapshot 
is violating the important design goal of {{read-only}} behavior, as files in 
the snapshot grow in size after the snapshot time. This mutable snapshot 
behavior makes HDFS Snapshots far less attractive and applications not able to 
make use of the feature in an useful way. 

bq. I think we first need to solve the problem about how to report the length 
of open files to NN (e.g., maybe utilizing the DN heartbeats or some other 
ways).

I can dig deeper on this model to make HDFS Snapshots much more reliable. Given 
that applications are already aware of NN length issues for open files, and the 
availability of {{hsync(SyncFlag.UPDATE_LENGTH)}} to close the same gap, and 
the proposed design is under a config which is turned off by default, do we 
still need to make this jira a dependent on _fixing open file lengths via 
heartbeat_ improvement ? Would love to hear your thoughts on this. 



> HDFS Snapshots should capture point-in-time copies of OPEN files
> ----------------------------------------------------------------
>
>                 Key: HDFS-11402
>                 URL: https://issues.apache.org/jira/browse/HDFS-11402
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 2.6.0
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. That is, 
> these open files are not frozen in HDFS Snapshots. These open files 
> grow/shrink in length, just like the original file, even after the snapshot 
> time.
> 2. At the time of File close or any other meta data modification operation on 
> these files, HDFS reconciles the file length and records the modification in 
> the last taken Snapshot. All the previously taken Snapshots continue to have 
> those open Files with no modification recorded. So, all those previous 
> snapshots end up using the final modification record in the last snapshot. 
> Thus after the file close, file lengths in all those snapshots will end up 
> same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can 
> optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze 
> open files. 
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with 
> {{LeaseManager}} and get a list INodesInPath for all open files under the 
> snapshot dir. 
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, 
> Diff creation and updating modification time, can invoke 
> {{INodeFile#recordModification}} for each of the open files. This way, the 
> Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for 
> each of the open files. 
> 4. Above model follows the current Snapshot and Diff protocols and doesn't 
> introduce any any disk formats. So, I don't think we will be needing any new 
> FSImage Loader/Saver changes for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of 
> snapshots in O(1) time. LeaseManager though has all the open files with 
> leases in-memory map, an iteration is still needed to prune the needed open 
> files and then run recordModification on each of them. So, it will not be a 
> strict O(1) with the above proposal. But, its going be a marginal increase 
> only as the new order will be of O(open_files_under_snap_dir). In order to 
> avoid HDFS Snapshots change in behavior for open files and avoid change in 
> time complexity, this improvement can be made under a new config 
> {{"dfs.namenode.snapshot.freeze.openfiles"}} which by default can be 
> {{false}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to