[ 
https://issues.apache.org/jira/browse/HADOOP-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620734#action_12620734
 ] 

Allen Wittenauer commented on HADOOP-3637:
------------------------------------------

Overall, what has been proposed sounds very promising to me.  Some comments 
though:

Requirements
===========

Requirements #4 vs. Non-goal #2: "Only a few of those snapshots will be 
accessed simultaneiously"

What happens if it is determined that finding that data that one is looking for 
is expensive?  In other words, what is Plan B?  

On busy systems, I can see where many users could be searching for data in 
different snapshots very, very easily, esp. when FUSE is involved.

Requirements #5:

While I understand what you are saying and why the requirement exists :) , it 
would be good to make sure this is really well documented.

[An aside... I never thought of directed graphs as being a mathematical 
construct.  At least I was taught them as part of my CS courses which were 
distinct from the math courses.  Hmm.]

"Special Number 500"
=================

Why 500?  That seems particularly arbitrary. I would recommend starting at a 
digit boundary.  1000, 10000, 100, whatever. 

Namedir Structure
==============

What happens when the number of snapshots gets large?  Any concern about things 
like directory name lookup caches at the (UNIX) file system level having 
issues?  Would it be a good idea to be able to support a multilevel hashed 
structure now or wait till someone needs it?

Appending to Files
==============
I have a bit of concern about the "wait for some period" bit.  We've noticed 
that when the file system gets full at the UNIX level, the name node goes a bit 
spastic while it tries to hunt for free space.  Now, clearly the name node 
should be better behaved in this sort of edge-case scenario.   But I'm 
wondering what the client should do if, when it retries after waiting for the 
NN to COW the block under such conditions. 

> Support for snapshots
> ---------------------
>
>                 Key: HADOOP-3637
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3637
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Snapshots.pdf
>
>
> Support HDFS snapshots. It should support creating snapshots without shutting 
> down the file system. Snapshot creation should be lightweight and a typical 
> system should be able to support a few thousands concurrent snapshots. There 
> should be a way to surface (i.e. mount) a few of these snapshots 
> simultaneously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to