[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480503#comment-13480503
 ] 

Aaron T. Myers commented on HDFS-2802:
--------------------------------------

Hey guys, thanks a lot for posting an updated design document. I have one 
high-level concern about the design, and a number of specific comments / 
questions about the design document itself.

First, I'm concerned with the O(# of files + # of directories) nature of this 
design, both in terms of time taken to create a snapshot and the NN memory 
resources consumed. 

It seems to me that this will result in snapshot creation/retention being 
sufficiently costly as to make creating snapshots with this design not a viable 
option for large HDFS instances. I think another design requirement for this 
work should be "snapshot creation must be sufficiently fast as to be 
unnoticeable to clients" (an attempted definition of "reasonable" per 
high-level requirement #4 on page 1.) I don't think this design will satisfy 
such a requirement in large HDFS instances. Also, many large HDFS instances 
that I'm aware of already run with very large NN heaps, and a snapshot design 
which results in making copies of large portions of the working set will not be 
viable in these situations. At least, I think this design does not address the 
#3 high-level requirement on page 1 of "support for a reasonable number of 
snapshots." In deployments where the NameNode is already running with a large 
working set, this design might only allow for a single snapshot of the root of 
the file system, or none at all.

I think this design document is a very good start, but we really must figure 
out a way to create and retain snapshots in a more efficient manner, both in 
terms of time to create the snapshot and memory overhead to retain the 
snapshot, in order for the snapshot solution to be viable for HDFS. Having an 
O(# of files + # of directories) system will not be acceptable for all but the 
smallest HDFS installations. I think that creation of a snapshot should be 
either O(1) or worst case O(depth of file system tree). There are many 
precedents for file systems supporting more efficient snapshot creation than 
O(# of files + # of directories), e.g. WAFL, ZFS, BTRFS, etc.

Second, comments on the design document itself:

# I see that you're now considering read/write snapshots as an optional 
requirement. Per several of the comments on HDFS-233, it seems to me that 
supporting writable snapshots is an extra complexity that many folks aren't 
actually interested in. I suggest we explicitly punt on support for writable 
snapshots, and declare that snapshots are completely immutable. I think doing 
so may have the potential to allow us to make some simplifying design decisions.
# On page 2 the design mentions that "it should be possible to extend the 
current design to materialize the snapshot metadata and migrate it to outside 
the NameNode." I didn't see any other discussion of this in the rest of the 
document. Can you perhaps expand on what you mean by this?
# On page 2 you mention that there "are snapshot root directories that are 
configured by the system administrator to allow snapshots." How are these 
configured? Is it something that can be dynamically added to a running NN? Or 
would it need to be configured at NN startup time?
# In several places (e.g. the footnote on page 2, use case 2 on page 6) the 
design document refers to file system "volumes," in particular "Snapshots are 
created at the volume level simplifying administration." What "volume" are you 
referring to here?
# I think that the "detailed requirements" section on page 4 is missing a 
critical requirement: the snapshot must be consistent from the perspective of 
individual clients. The requirements state that the snapshot must be atomic, 
but not consistent. The consistency of the snapshot really must be well-defined 
and strong. For example, it would be unacceptable if files restored from an 
HDFS snapshot of a running hbase.rootdir resulted in a corrupted HBase instance.
# On page 4 you mention that a snapshot will have "a unique snapshot name for a 
given path." Can you expand upon that? How is this name created? Why is this 
necesssary in addition to the "path where the snapshot is created" which as far 
as I can tell should also serve the purpose of uniquely identifying the 
snapshot?
# I think that the solution described on page 5 of the document for the length 
of files being written does not satisfy the requirement that I mentioned above 
that the snapshot must be consistent. In particular, I think I can construct a 
scenario wherein a client which performs an NN-only metadata operation (A), 
then writes and hflushes some data (B), and then performs another metadata 
operation (C) right before a snapshot is created may result in the snapshot 
containing both metadata operations, but not the data hflush'ed to the DNs, 
i.e. A and C but not B. This would result in the snapshot representing a moment 
in time that never existed from the point of view of that client. We can 
continue the discussion of this particular issue more on HDFS-3960, if you'd 
like.
# I'm glad that the document discusses atime - that is not something that I had 
considered in my thinking on HDFS snapshots. I am a little leery, however, of 
tracking atime at all in what is ostensibly a read-only snapshot. Does anyone 
know what other file systems that support read-only snapshots do with regard to 
atime?
# Regarding open question #2 on page 7, I would think that this should be a 
hard requirement if we go with this design, and the main motivation should be 
taking snapshots of different parts of the tree on different schedules. For 
example, an administrator may want to schedule a nightly snapshot of the whole 
FS, but hourly snapshots of their /user directories.
# Regarding open question #4 on page 7, I feel confident that this design does 
not currently result in producing consistent HBase snapshots, because of #7 I 
described above.
# One question regarding the user experience that I don't see described in the 
document: will creating a snapshot require super user privileges? Or can any 
user create a snapshot of a subdirectory? If the latter, what permissions are 
required to create a snapshot? What if the user doesn't have permissions on 
some files under the subtree of the snapshot target? Does this result in an 
incomplete snapshot? Or a completely failed snapshot? My personal inclination 
is to limit snapshot creation to super users only, as a simplification.
# One high-level comment on the document: I'm a little leery of introducing 
this new concept of "snapshottable directories." I'm not aware of any precedent 
in other file systems for this sort of restriction, and I fear that the concept 
may be confusing for administrators and operators of HDFS.
                
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to