[jira] [Updated] (HDFS-2802) Support for RW/RO snapshots in HDFS

Aaron T. Myers (JIRA) Sun, 28 Oct 2012 17:19:13 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aaron T. Myers updated HDFS-2802:
---------------------------------

    Attachment: HDFSSnapshotsDesign.pdf

Hi all, attached please find a somewhat different design for implementing 
snapshot support in HDFS that myself and a few others have discussed. Please 
have a look at it.

Though this design differs somewhat from the previous design posted by 
Nicholas, I don't think the two designs insurmountably far apart. Though I 
certainly don't expect to switch to this design wholesale, I would like to see 
if we can come up with a hybrid design which incorporates some aspects of both. 
Let me try to outline where I see the designs differing, and suggest ways we 
can move forward with a hybrid design.

# *Efficiency of snapshot creation.* In the design posted by Nicholas, creation 
of a snapshot is O{n} in terms of the number of files/directories captured by 
the snapshot, both in terms of time and space efficiency. The design proposed 
in this document would be O{1} at snapshot creation time, and then 
copy-on-write thereafter for files/directories which are modified after the 
snapshot is created. This is accomplished by assigning unique, increasing 
integer IDs to snapshots and giving each INode a start_snap and end_snap ID to 
denote which snapshots the INode should be a part of. I'm not wedded to the 
precise design described in this document, but it seems like a reasonable 
design to me, so I'd like to consider this for the design to implement 
HDFS-4103 (Support O{1} snapshot creation).
# *Support for subdirectory snapshots.* The design posted by Nicholas allows 
for individual subdirectories of an HDFS namespace to be snapshotted by 
introducing "snapshottable directories." The design proposed in this document 
would only support snapshots at the root level of the file system. I think an 
easy way to produce a hybrid between these two designs would be to stick with 
the "snapshottable directory" system described in the document posted by 
Nicholas, and store the snapshot ID info at that INodeDirectory, instead of 
globally for the whole file system as is described in the document I've just 
posted. Such a scheme will allow both for efficient snapshot creation and 
creation of snapshots of subdirectories of the file system.
# *Support for non-super users to create snapshots.* The design posted by 
Nicholas allows for non-super users to create snapshots. The scheme described 
in the document I've just posted would only allow super users to create 
snapshots, in instances where administrators want tight control over the 
snapshots in their system. I propose we stick with the design described in the 
document posted by Nicholas, but allow for user-initiated snapshot creation to 
be optionally disabled by the administrator, either globally or 
per-snapshottable directory. This should allow for both use cases 
simultaneously.
# *Materialization of snapshots.* The scheme described in the document posted 
by Nicholas allows for the state of the FS in a snapshot to only be accessed 
from the snapshot root, i.e. the snapshottable directory, and allows for 
snapshots to be created with arbitrary names. The scheme described by the 
document I've just posted would have the return value of 
ClientProtocol#getListing modified on the fly by the NameNode so that a 
".snapshots" directory will appear to be present in every directory which has a 
snapshot available for it, with the available snapshots listed under this 
"directory" by their snapshot ID. This is similar to the user experience that 
users of WAFL file systems are familiar to, and so should be familiar to many 
users of FS snapshots. I'd like us to consider going with this scheme.

Please consider this proposal. I'd love to discuss this further at the design 
meeting later this week as previously mentioned by Suresh. By the way, can we 
nail down the precise date/time for that meeting? Sanjay mentioned to me 
offline that it would probably be on Wednesday, but I haven't heard anything 
beyond that. I'd be happy to offer up space in the Cloudera office, if that 
would be helpful. Let me know.

Thanks everyone.
                
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: HDFSSnapshotsDesign.pdf, snap.patch, 
> snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2802) Support for RW/RO snapshots in HDFS

Reply via email to