[
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480547#comment-13480547
]
Suresh Srinivas commented on HDFS-2802:
---------------------------------------
Thanks for the comments guys.
bq. In some of the most commercially popular systems which implement snapshots,
snapshots do not count against the disk quotas
How do they handle disk quota use when the original file is deleted and only
snapshots exit? That is the reason why counting the disk quota makes sense.
bq. First, I'm concerned with the O(# of files + # of directories) nature of
this design, both in terms of time taken to create a snapshot and the NN memory
resources consumed.
I agree with you on this. We wanted to begin with this approach and then
optimize it further in memory. The initial patch uploaded here tried premature
optimization both for memory and snapshot creation time and thus made the code
really complicated. But this is a definite goal and that part of the design we
will update as we continue to work. This is covered in open issues/future work
section.
comment 1:
Agree with this part. As we continue the work, we can make a decision on this.
For supporting RW, lets not make the design/implementation more complicated.
comment 2:
Will address this as we continue to add more details to the design in the next
update.
Comment 3, 6:
I want to make sure you understand this is early design and we will continue to
add more details. I think some of the questions will be answered by how this
works:
- Admin can mark directories as snapshottable using CLI
- User then can create snapshots for these directories using CLI/API. A
snapshot has a snapshot name and it is unique for given snapshot root.
comment 4:
If you look at snapshot implementation in other systems it is done at volume
level. That is the parallel we are talking about.
Comment 5, Comment 7, comment 10:
As regards to consistency (comment 7), a system where snapshot is taken at the
namespace without involving data layer cannot provide string consistency
guarantee. I also think it may not be relevant where writers are different from
the client that is taking the snapshot. Not sure what guarantee such a client
can expect/depend on given writers are separate. We could discuss this during
design review. I also think based on discussion with few HBase folks, they
should be okay with it. Some thing to discuss with them. I am also not clear on
their dependency on HDFS with hbase-6055.
comment 8:
This could change during implementation if we think access time may not be that
important to maintain.
comment 9:
Agreed. I am leaning towards allowing it.
comment 11:
Will add usecases
comment 12:
See the volume comment and the document sort of covers this. We could discuss
this further if the document is not clear.
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>
> Key: HDFS-2802
> URL: https://issues.apache.org/jira/browse/HDFS-2802
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: data-node, name-node
> Reporter: Hari Mankude
> Assignee: Hari Mankude
> Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
>
>
> Snapshots are point in time images of parts of the filesystem or the entire
> filesystem. Snapshots can be a read-only or a read-write point in time copy
> of the filesystem. There are several use cases for snapshots in HDFS. I will
> post a detailed write-up soon with with more information.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira