[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370199#comment-16370199
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:
--------------------------------------------

Thanks [~shashikant] for working on this.  Some comments on the patch:
- Pass INodeDirectory as a parameter in getSumForRange(..).  Then, we could 
remove INodeDirectory dir from DirectoryDiffList.
- Let's replace getSumForRange with getMinListForRange in DiffList so that we 
may implement it DiffListByArrayList using subList.
- diffSetIndexList does not seem useful since it is the same as the nodes in 
level 1.  BTW, diffSetIndexList is not updated when remove an element so that 
it seems a bug.  I suggest removing diffSetIndexList since it can be computed 
if necessary.
- TestDirectoryDiffList does not test remove(..).  As mentioned, remove(..) 
seems having some bugs.


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> --------------------------------------------------------------------
>
>                 Key: HDFS-13102
>                 URL: https://issues.apache.org/jira/browse/HDFS-13102
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Major
>         Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to