[
https://issues.apache.org/jira/browse/HDFS-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537650#comment-13537650
]
Aaron T. Myers commented on HDFS-4103:
--------------------------------------
I've just finished reviewing the patch. Given that nothing's been committed to
the branch since this was committed, it seems easiest to me to revert the
commit, address this feedback, and then commit again with the feedback
incorporated. If for some reason you'd rather address this feedback via a
follow-up JIRA, that's fine by me as well.
The patch overall looks very good. I agree with all of the comments that Suresh
has previously posted. Of the comments below, I think that #16 is probably the
most serious.
# Some commented out code in TestSnapshot without explanation. Add a comment?
File a JIRA?
# TestNestedSnapshots - I'm really not a fan of disabling loggers in the tests.
If the tests fails, it may be impossible to tell why because the relevant log
message may have been disabled.
# With regard to the comment disabling the append test case of
TestSnapshotPathINodes - appreciate you adding the TODO, but can you please
elaborate in that comment about what the problem is currently with append?
And/or file a follow-up JIRA to fix this issue?
# There's a few "System.out.println("XXX ...")" left throughout the patch. If
you're going to leave these in, please make it a little clearer what the point
is, along the lines of "====> Test completed, dumbing tree for debugging <===="
or something.
# TestSnapshot - there's some commented out code which apparently would have
tested FileAppend. Add a TODO and/or file a JIRA?
# TestNestedSnapshots#print - please rename parameter "mess" to "message" or
"msg" or something clearer. Using "mess" threw me off a bit at first.
# TestNestedSnapshots#assertFile - could you please add a method comment here?
Doesn't have to be a proper javadoc necessarily, but it's a little tough to
tell what the method's doing at first blush. It's also a little odd that the
method appears to be written so as to support an arbitrary number of paths to
verify since it uses varargs, but it in fact only supports verifying the
presence of exactly 3 paths. Maybe switch to a three-arg method?
# TestNestedSnapshots#assertFile - please move "bar" to a constant, given that
it's hard-coded both here and in testNestedSnapshots.
# Seems like TestSnapshotReplication#getINodeFile can be made private.
# TestSnapshotPathINodes#testSnapshotPathINodesAfterModification - there's an
assertFalse(...) which compares two values for mod times. Please consider
adding an assert message there to print out the actual values of of the mod
times.
# Considering the many places where we now call INodeDirectory#addChild with
the third argument being "null", perhaps add a two-arg method which defaults to
passing null?
# SnapshotDiff#compareTo - please change that_snapshot to "thatSnapshot" or
"otherSnapshot"
# Considering that the Pair and Triple classes are pretty generic classes and
are now being used in several somewhat unrelated parts of the code, consider
moving them out of inner classes of INode and into some util package? Perhaps
in Common?
# Really good class comment for SnapshotDiff. Very helpful and well-described.
# Having SnapshotDiff as a non-static inner class of
INodeDirectorySnapshottable makes it a little difficult to follow the scope of
some of the references. I think it's a sufficiently meaty class to warrant
being a top-level class, with a reference to the associated
INodeDirectorySnapshottable. If for some reason you want to leave it as an
inner class, let's make it a static inner class.
# SnapshotDiff#getSnapshotINode - given that this may end up recursively
calling getSnapshotINode, it seems like we might easily end up overflowing the
stack if a directory has many snapshots taken of it with changes between each
snapshot. I believe by default a StackOverflowException will end up getting
called after a stack is 1024 method calls deep, yet the limit of how many
snapshots can exist per snapshottable directory is set to 65536.
# I recommend adding a class comment to INodeDirectoryWithSnapshot to explain a
little bit about how it does its work by overriding several methods of the
INodeDirectory class to look back into snapshots as appropriate. Once I
understood it it was very clear, but I think it could stand to be explained
with a comment.
> Support O(1) snapshot creation
> ------------------------------
>
> Key: HDFS-4103
> URL: https://issues.apache.org/jira/browse/HDFS-4103
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: Snapshot (HDFS-2802)
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4103_20121129.patch, h4103_20121202b.patch,
> h4103_20121202.patch, h4103_20121209b.patch, h4103_20121209.patch,
> h4103_20121210b.patch, h4103_20121210.patch, h4103_20121211.patch,
> h4103_20121212b.patch, h4103_20121212.patch, h4103_20121213.patch,
> h4103_20121215b.patch, h4103_20121215.patch, h4103_20121216.patch,
> h4103_20121217b.patch, h4103_20121217.patch, h4103_20121218.patch,
> h4103_20121219.patch, h4103_20121220.patch
>
>
> In our first snapshot implementation, snapshot creation runs in O(N) and
> occupies O(N) memory space, where N = # files + # directories + # symlinks in
> the snapshot. The advantages of the implementation are that there is no
> additional cost for the modifications after snapshots are created, and it
> leads to a simple implementation.
> In this JIRA, we optimize snapshot creation to O(1) although it introduces
> additional cost in the modifications after snapshots are created. Note that
> the INode is given as an assumption, otherwise, there is a non-constant cost
> to find the INode.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira