[
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron T. Myers updated HDFS-2802:
---------------------------------
Attachment: HDFSSnapshotsDesign.pdf
Hi all, attached please find a somewhat different design for implementing
snapshot support in HDFS that myself and a few others have discussed. Please
have a look at it.
Though this design differs somewhat from the previous design posted by
Nicholas, I don't think the two designs insurmountably far apart. Though I
certainly don't expect to switch to this design wholesale, I would like to see
if we can come up with a hybrid design which incorporates some aspects of both.
Let me try to outline where I see the designs differing, and suggest ways we
can move forward with a hybrid design.
# *Efficiency of snapshot creation.* In the design posted by Nicholas, creation
of a snapshot is O{n} in terms of the number of files/directories captured by
the snapshot, both in terms of time and space efficiency. The design proposed
in this document would be O{1} at snapshot creation time, and then
copy-on-write thereafter for files/directories which are modified after the
snapshot is created. This is accomplished by assigning unique, increasing
integer IDs to snapshots and giving each INode a start_snap and end_snap ID to
denote which snapshots the INode should be a part of. I'm not wedded to the
precise design described in this document, but it seems like a reasonable
design to me, so I'd like to consider this for the design to implement
HDFS-4103 (Support O{1} snapshot creation).
# *Support for subdirectory snapshots.* The design posted by Nicholas allows
for individual subdirectories of an HDFS namespace to be snapshotted by
introducing "snapshottable directories." The design proposed in this document
would only support snapshots at the root level of the file system. I think an
easy way to produce a hybrid between these two designs would be to stick with
the "snapshottable directory" system described in the document posted by
Nicholas, and store the snapshot ID info at that INodeDirectory, instead of
globally for the whole file system as is described in the document I've just
posted. Such a scheme will allow both for efficient snapshot creation and
creation of snapshots of subdirectories of the file system.
# *Support for non-super users to create snapshots.* The design posted by
Nicholas allows for non-super users to create snapshots. The scheme described
in the document I've just posted would only allow super users to create
snapshots, in instances where administrators want tight control over the
snapshots in their system. I propose we stick with the design described in the
document posted by Nicholas, but allow for user-initiated snapshot creation to
be optionally disabled by the administrator, either globally or
per-snapshottable directory. This should allow for both use cases
simultaneously.
# *Materialization of snapshots.* The scheme described in the document posted
by Nicholas allows for the state of the FS in a snapshot to only be accessed
from the snapshot root, i.e. the snapshottable directory, and allows for
snapshots to be created with arbitrary names. The scheme described by the
document I've just posted would have the return value of
ClientProtocol#getListing modified on the fly by the NameNode so that a
".snapshots" directory will appear to be present in every directory which has a
snapshot available for it, with the available snapshots listed under this
"directory" by their snapshot ID. This is similar to the user experience that
users of WAFL file systems are familiar to, and so should be familiar to many
users of FS snapshots. I'd like us to consider going with this scheme.
Please consider this proposal. I'd love to discuss this further at the design
meeting later this week as previously mentioned by Suresh. By the way, can we
nail down the precise date/time for that meeting? Sanjay mentioned to me
offline that it would probably be on Wednesday, but I haven't heard anything
beyond that. I'd be happy to offer up space in the Cloudera office, if that
would be helpful. Let me know.
Thanks everyone.
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: data-node, name-node
> Reporter: Hari Mankude
> Assignee: Hari Mankude
> Attachments: HDFSSnapshotsDesign.pdf, snap.patch,
> snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire
> filesystem. Snapshots can be a read-only or a read-write point in time copy
> of the filesystem. There are several use cases for snapshots in HDFS. I will
> post a detailed write-up soon with with more information.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira