[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

Konstantin Shvachko (JIRA) Tue, 30 Oct 2012 22:33:16 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487547#comment-13487547
 ]


Konstantin Shvachko commented on HDFS-2802:
-------------------------------------------

I propose to divide this discussion into three categories: design goals, API 
and semantics, algorithms and implementation. If people can agree on one we can 
move to the next.
# I see three main *design goals* proposed: snapshots should be (a) read-only, 
(b) directory-level, (c) multiple.
This should hopefully work for everybody.
# *API*. Seems to me the most important point now.
HDFSSnapshotsDesign.pdf doesn't talk much about APIs except a reference to WAFL.
Snapshots20121030.pdf has examples of shell commands, which look a bit 
convoluted. I mean using delimiter ".snapshot" to specify a snapshot means I 
cannot have entries with that name.
Wouldn't it be better to control access to snapshots via -version option:
{{rm -r -version 3 /user/shv/hbase/}}  remove snapshot with id 3.
{{ls -version 2 /user/shv/hbase/}}  listing of the snapshot #2.
{{ls -versions /user/shv/hbase/}}  listing of snapshot ids of the directory.
Where non -versioned commands deal with "current" state of the file system as 
today.
I like the idea of generating globally unique version ids, and assigning them 
to snapshots internally rather than letting people invent their own. One can 
always list available versions and read the desired one. So the -createSnapshot 
command does not need to pass <snapname>, but will instead get it in return.
# *Algorithms*. I agree the length of an under-construction file in the 
snapshot should come directly from the namespace. And we provide means to 
update it with hflush before the snapshot is taken.
Creating duplicate INodes with a diff, this is sort of COW technique, right? 
Sounds hard.
It is simpler for me to think of versioned files and directories in this case. 
Creating a snapshot assigns a new version to objects.
Deleting a file should remove current version, but leave other versions 
unchanged. Can be implemented by marking the file "deleted" until all versions 
disappear, when it can be physically removed.

My dumb question: can I create a snapshot of a subdirectory that is a part of a 
snapshot above it?
                
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: HDFSSnapshotsDesign.pdf, snap.patch, 
> snapshot-one-pager.pdf, Snapshots20121018.pdf, Snapshots20121030.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

Reply via email to