[ 
https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969728#comment-15969728
 ] 

Andrew Wang commented on HDFS-11402:
------------------------------------

Hi Manoj, thanks for working on this. I gave it a nit pass, still thinking 
about the high-level idea, but wanted to post this up before the weekend:

* typo: FRREZE -> FREEZE
* Maybe name it "dfs.namenode.snapshot.capture-file-length" to be more 
explanatory?
* hdfs-default.xml: hflush/hsync also do not update the NN's view of the file 
length unless the client calls hsync with a special flag.

INodesInPath
* Not a fan of Pair since it doesn't document its entries, could we instead add 
a helper method that given an INode[] returns a byte[][]?
* We made an effort to avoid resolving paths -> IIPs and back during RPC 
handling for performance reasons. This is a special case for the lease manager, 
which only has an INode ID. Can you add a comment saying as much for the newly 
added methods?
* isDescendant javadoc, could you reverse it to say "if this InodesInPath is a 
descendant of the specified INodeDirectory"?
* the private isDescendant is only called by the public one, combine into one? 
This way we don't lose the typesafety of the INodeDirectory also.
* isDescendant, if we're an IIP, can we simply look in our array of INodes for 
the specified ancestor? This method looks expensive right now.

LeaseManager
* What are the locking requirements for the new methods? add asserts?
* Is the threading important for performance? Was this benchmarked? I suspect 
it's slower for small numbers of open files.
* Can call {{shutdown}} immediately after adding all the futures to the 
executor.
* Currently we swallow when a future.get throws an exception. If we return a 
partial set, won't the snapshot be inaccurate?
* ancestorDir isn't mutated, so can we move the ancestorDir null check out of 
the Callable?

* testCheckLease, why is there a new sleep(1)? Thread sleeps in a unit test are 
a code smell.

> HDFS Snapshots should capture point-in-time copies of OPEN files
> ----------------------------------------------------------------
>
>                 Key: HDFS-11402
>                 URL: https://issues.apache.org/jira/browse/HDFS-11402
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 2.6.0
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch, 
> HDFS-11402.03.patch, HDFS-11402.04.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. That is, 
> these open files are not frozen in HDFS Snapshots. These open files 
> grow/shrink in length, just like the original file, even after the snapshot 
> time.
> 2. At the time of File close or any other meta data modification operation on 
> these files, HDFS reconciles the file length and records the modification in 
> the last taken Snapshot. All the previously taken Snapshots continue to have 
> those open Files with no modification recorded. So, all those previous 
> snapshots end up using the final modification record in the last snapshot. 
> Thus after the file close, file lengths in all those snapshots will end up 
> same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can 
> optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze 
> open files. 
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with 
> {{LeaseManager}} and get a list INodesInPath for all open files under the 
> snapshot dir. 
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, 
> Diff creation and updating modification time, can invoke 
> {{INodeFile#recordModification}} for each of the open files. This way, the 
> Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for 
> each of the open files. 
> 4. Above model follows the current Snapshot and Diff protocols and doesn't 
> introduce any any disk formats. So, I don't think we will be needing any new 
> FSImage Loader/Saver changes for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of 
> snapshots in O(1) time. LeaseManager though has all the open files with 
> leases in-memory map, an iteration is still needed to prune the needed open 
> files and then run recordModification on each of them. So, it will not be a 
> strict O(1) with the above proposal. But, its going be a marginal increase 
> only as the new order will be of O(open_files_under_snap_dir). In order to 
> avoid HDFS Snapshots change in behavior for open files and avoid change in 
> time complexity, this improvement can be made under a new config 
> {{"dfs.namenode.snapshot.freeze.openfiles"}} which by default can be 
> {{false}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to