[ https://issues.apache.org/jira/browse/HADOOP-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620734#action_12620734 ]
Allen Wittenauer commented on HADOOP-3637: ------------------------------------------ Overall, what has been proposed sounds very promising to me. Some comments though: Requirements =========== Requirements #4 vs. Non-goal #2: "Only a few of those snapshots will be accessed simultaneiously" What happens if it is determined that finding that data that one is looking for is expensive? In other words, what is Plan B? On busy systems, I can see where many users could be searching for data in different snapshots very, very easily, esp. when FUSE is involved. Requirements #5: While I understand what you are saying and why the requirement exists :) , it would be good to make sure this is really well documented. [An aside... I never thought of directed graphs as being a mathematical construct. At least I was taught them as part of my CS courses which were distinct from the math courses. Hmm.] "Special Number 500" ================= Why 500? That seems particularly arbitrary. I would recommend starting at a digit boundary. 1000, 10000, 100, whatever. Namedir Structure ============== What happens when the number of snapshots gets large? Any concern about things like directory name lookup caches at the (UNIX) file system level having issues? Would it be a good idea to be able to support a multilevel hashed structure now or wait till someone needs it? Appending to Files ============== I have a bit of concern about the "wait for some period" bit. We've noticed that when the file system gets full at the UNIX level, the name node goes a bit spastic while it tries to hunt for free space. Now, clearly the name node should be better behaved in this sort of edge-case scenario. But I'm wondering what the client should do if, when it retries after waiting for the NN to COW the block under such conditions. > Support for snapshots > --------------------- > > Key: HADOOP-3637 > URL: https://issues.apache.org/jira/browse/HADOOP-3637 > Project: Hadoop Core > Issue Type: New Feature > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: Snapshots.pdf > > > Support HDFS snapshots. It should support creating snapshots without shutting > down the file system. Snapshot creation should be lightweight and a typical > system should be able to support a few thousands concurrent snapshots. There > should be a way to surface (i.e. mount) a few of these snapshots > simultaneously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.