[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480547#comment-13480547
 ] 

Suresh Srinivas commented on HDFS-2802:
---------------------------------------

Thanks for the comments guys.



bq. In some of the most commercially popular systems which implement snapshots, 
snapshots do not count against the disk quotas
How do they handle disk quota use when the original file is deleted and only 
snapshots exit? That is the reason why counting the disk quota makes sense.

bq. First, I'm concerned with the O(# of files + # of directories) nature of 
this design, both in terms of time taken to create a snapshot and the NN memory 
resources consumed.
I agree with you on this. We wanted to begin with this approach and then 
optimize it further in memory. The initial patch uploaded here tried premature 
optimization both for memory and snapshot creation time and thus made the code 
really complicated. But this is a definite goal and that part of the design we 
will update as we continue to work. This is covered in open issues/future work 
section.

comment 1:
Agree with this part. As we continue the work, we can make a decision on this. 
For supporting RW, lets not make the design/implementation more complicated.

comment 2:
Will address this as we continue to add more details to the design in the next 
update.


Comment 3, 6:
I want to make sure you understand this is early design and we will continue to 
add more details. I think some of the questions will be answered by how this 
works:
- Admin can mark directories as snapshottable using CLI
- User then can create snapshots for these directories using CLI/API. A 
snapshot has a snapshot name and it is unique for given snapshot root.

comment 4:
If you look at snapshot implementation in other systems it is done at volume 
level. That is the parallel we are talking about.

Comment 5, Comment 7, comment 10:
As regards to consistency (comment 7), a system where snapshot is taken at the 
namespace without involving data layer cannot provide string consistency 
guarantee. I also think it may not be relevant where writers are different from 
the client that is taking the snapshot. Not sure what guarantee such a client 
can expect/depend on given writers are separate. We could discuss this during 
design review. I also think based on discussion with few HBase folks, they 
should be okay with it. Some thing to discuss with them. I am also not clear on 
their dependency on HDFS with hbase-6055.

comment 8:
This could change during implementation if we think access time may not be that 
important to maintain.

comment 9:
Agreed. I am leaning towards allowing it.

comment 11:
Will add usecases

comment 12:
See the volume comment and the document sort of covers this. We could discuss 
this further if the document is not clear.




                
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire 
> filesystem. Snapshots can be a read-only or a read-write point in time copy 
> of the filesystem. There are several use cases for snapshots in HDFS. I will 
> post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to