[
https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969728#comment-15969728
]
Andrew Wang commented on HDFS-11402:
------------------------------------
Hi Manoj, thanks for working on this. I gave it a nit pass, still thinking
about the high-level idea, but wanted to post this up before the weekend:
* typo: FRREZE -> FREEZE
* Maybe name it "dfs.namenode.snapshot.capture-file-length" to be more
explanatory?
* hdfs-default.xml: hflush/hsync also do not update the NN's view of the file
length unless the client calls hsync with a special flag.
INodesInPath
* Not a fan of Pair since it doesn't document its entries, could we instead add
a helper method that given an INode[] returns a byte[][]?
* We made an effort to avoid resolving paths -> IIPs and back during RPC
handling for performance reasons. This is a special case for the lease manager,
which only has an INode ID. Can you add a comment saying as much for the newly
added methods?
* isDescendant javadoc, could you reverse it to say "if this InodesInPath is a
descendant of the specified INodeDirectory"?
* the private isDescendant is only called by the public one, combine into one?
This way we don't lose the typesafety of the INodeDirectory also.
* isDescendant, if we're an IIP, can we simply look in our array of INodes for
the specified ancestor? This method looks expensive right now.
LeaseManager
* What are the locking requirements for the new methods? add asserts?
* Is the threading important for performance? Was this benchmarked? I suspect
it's slower for small numbers of open files.
* Can call {{shutdown}} immediately after adding all the futures to the
executor.
* Currently we swallow when a future.get throws an exception. If we return a
partial set, won't the snapshot be inaccurate?
* ancestorDir isn't mutated, so can we move the ancestorDir null check out of
the Callable?
* testCheckLease, why is there a new sleep(1)? Thread sleeps in a unit test are
a code smell.
> HDFS Snapshots should capture point-in-time copies of OPEN files
> ----------------------------------------------------------------
>
> Key: HDFS-11402
> URL: https://issues.apache.org/jira/browse/HDFS-11402
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 2.6.0
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
> Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch,
> HDFS-11402.03.patch, HDFS-11402.04.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in
> parallel, Snapshots do capture all these files, but these being written files
> in Snapshots do not have the point-in-time file length captured. That is,
> these open files are not frozen in HDFS Snapshots. These open files
> grow/shrink in length, just like the original file, even after the snapshot
> time.
> 2. At the time of File close or any other meta data modification operation on
> these files, HDFS reconciles the file length and records the modification in
> the last taken Snapshot. All the previously taken Snapshots continue to have
> those open Files with no modification recorded. So, all those previous
> snapshots end up using the final modification record in the last snapshot.
> Thus after the file close, file lengths in all those snapshots will end up
> same.
> Assume File1 is opened for write and a total of 1MB written to it. While the
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can
> optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze
> open files.
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with
> {{LeaseManager}} and get a list INodesInPath for all open files under the
> snapshot dir.
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation,
> Diff creation and updating modification time, can invoke
> {{INodeFile#recordModification}} for each of the open files. This way, the
> Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for
> each of the open files.
> 4. Above model follows the current Snapshot and Diff protocols and doesn't
> introduce any any disk formats. So, I don't think we will be needing any new
> FSImage Loader/Saver changes for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of
> snapshots in O(1) time. LeaseManager though has all the open files with
> leases in-memory map, an iteration is still needed to prune the needed open
> files and then run recordModification on each of them. So, it will not be a
> strict O(1) with the above proposal. But, its going be a marginal increase
> only as the new order will be of O(open_files_under_snap_dir). In order to
> avoid HDFS Snapshots change in behavior for open files and avoid change in
> time complexity, this improvement can be made under a new config
> {{"dfs.namenode.snapshot.freeze.openfiles"}} which by default can be
> {{false}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]