[
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480503#comment-13480503
]
Aaron T. Myers commented on HDFS-2802:
--------------------------------------
Hey guys, thanks a lot for posting an updated design document. I have one
high-level concern about the design, and a number of specific comments /
questions about the design document itself.
First, I'm concerned with the O(# of files + # of directories) nature of this
design, both in terms of time taken to create a snapshot and the NN memory
resources consumed.
It seems to me that this will result in snapshot creation/retention being
sufficiently costly as to make creating snapshots with this design not a viable
option for large HDFS instances. I think another design requirement for this
work should be "snapshot creation must be sufficiently fast as to be
unnoticeable to clients" (an attempted definition of "reasonable" per
high-level requirement #4 on page 1.) I don't think this design will satisfy
such a requirement in large HDFS instances. Also, many large HDFS instances
that I'm aware of already run with very large NN heaps, and a snapshot design
which results in making copies of large portions of the working set will not be
viable in these situations. At least, I think this design does not address the
#3 high-level requirement on page 1 of "support for a reasonable number of
snapshots." In deployments where the NameNode is already running with a large
working set, this design might only allow for a single snapshot of the root of
the file system, or none at all.
I think this design document is a very good start, but we really must figure
out a way to create and retain snapshots in a more efficient manner, both in
terms of time to create the snapshot and memory overhead to retain the
snapshot, in order for the snapshot solution to be viable for HDFS. Having an
O(# of files + # of directories) system will not be acceptable for all but the
smallest HDFS installations. I think that creation of a snapshot should be
either O(1) or worst case O(depth of file system tree). There are many
precedents for file systems supporting more efficient snapshot creation than
O(# of files + # of directories), e.g. WAFL, ZFS, BTRFS, etc.
Second, comments on the design document itself:
# I see that you're now considering read/write snapshots as an optional
requirement. Per several of the comments on HDFS-233, it seems to me that
supporting writable snapshots is an extra complexity that many folks aren't
actually interested in. I suggest we explicitly punt on support for writable
snapshots, and declare that snapshots are completely immutable. I think doing
so may have the potential to allow us to make some simplifying design decisions.
# On page 2 the design mentions that "it should be possible to extend the
current design to materialize the snapshot metadata and migrate it to outside
the NameNode." I didn't see any other discussion of this in the rest of the
document. Can you perhaps expand on what you mean by this?
# On page 2 you mention that there "are snapshot root directories that are
configured by the system administrator to allow snapshots." How are these
configured? Is it something that can be dynamically added to a running NN? Or
would it need to be configured at NN startup time?
# In several places (e.g. the footnote on page 2, use case 2 on page 6) the
design document refers to file system "volumes," in particular "Snapshots are
created at the volume level simplifying administration." What "volume" are you
referring to here?
# I think that the "detailed requirements" section on page 4 is missing a
critical requirement: the snapshot must be consistent from the perspective of
individual clients. The requirements state that the snapshot must be atomic,
but not consistent. The consistency of the snapshot really must be well-defined
and strong. For example, it would be unacceptable if files restored from an
HDFS snapshot of a running hbase.rootdir resulted in a corrupted HBase instance.
# On page 4 you mention that a snapshot will have "a unique snapshot name for a
given path." Can you expand upon that? How is this name created? Why is this
necesssary in addition to the "path where the snapshot is created" which as far
as I can tell should also serve the purpose of uniquely identifying the
snapshot?
# I think that the solution described on page 5 of the document for the length
of files being written does not satisfy the requirement that I mentioned above
that the snapshot must be consistent. In particular, I think I can construct a
scenario wherein a client which performs an NN-only metadata operation (A),
then writes and hflushes some data (B), and then performs another metadata
operation (C) right before a snapshot is created may result in the snapshot
containing both metadata operations, but not the data hflush'ed to the DNs,
i.e. A and C but not B. This would result in the snapshot representing a moment
in time that never existed from the point of view of that client. We can
continue the discussion of this particular issue more on HDFS-3960, if you'd
like.
# I'm glad that the document discusses atime - that is not something that I had
considered in my thinking on HDFS snapshots. I am a little leery, however, of
tracking atime at all in what is ostensibly a read-only snapshot. Does anyone
know what other file systems that support read-only snapshots do with regard to
atime?
# Regarding open question #2 on page 7, I would think that this should be a
hard requirement if we go with this design, and the main motivation should be
taking snapshots of different parts of the tree on different schedules. For
example, an administrator may want to schedule a nightly snapshot of the whole
FS, but hourly snapshots of their /user directories.
# Regarding open question #4 on page 7, I feel confident that this design does
not currently result in producing consistent HBase snapshots, because of #7 I
described above.
# One question regarding the user experience that I don't see described in the
document: will creating a snapshot require super user privileges? Or can any
user create a snapshot of a subdirectory? If the latter, what permissions are
required to create a snapshot? What if the user doesn't have permissions on
some files under the subtree of the snapshot target? Does this result in an
incomplete snapshot? Or a completely failed snapshot? My personal inclination
is to limit snapshot creation to super users only, as a simplification.
# One high-level comment on the document: I'm a little leery of introducing
this new concept of "snapshottable directories." I'm not aware of any precedent
in other file systems for this sort of restriction, and I fear that the concept
may be confusing for administrators and operators of HDFS.
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: data-node, name-node
> Reporter: Hari Mankude
> Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire
> filesystem. Snapshots can be a read-only or a read-write point in time copy
> of the filesystem. There are several use cases for snapshots in HDFS. I will
> post a detailed write-up soon with with more information.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira