[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968754#comment-14968754
 ] 

Yi Liu commented on HDFS-9053:
------------------------------

Thanks for the discussion, Nicholas.

{quote}
BTW, why do we need two arrays for elements and children. I think BTree 
implementation naturally needs only one array.
{quote}
Actually most implementations I saw use two array. 
One array stores the elements, and another array stores the reference to 
children nodes (childrenSize == elementsSize+1 for non leaf nodes). 
Having them in one array makes: 1) the array contains two different types, I 
think it's strange. 2) more complicated logic and not easy to understand.

{quote}
When the #children is small, ArrayList is just fine. Let's use BTree only when 
#children is larger than a threshold
{quote}
Yeah, I ever did this as you suggested, please see {{005}} patch, I added a 
{{SortedCollection}} which wraps a custom array list implementation and the 
btree. 
- I didn't do as your suggestion of "The field in INodeDirectory is List<INode> 
children which may refer to either an ArrayList or a BTreeList. We may replace 
the list at runtime", is because:  It's hard to use a List<Node>, since when we 
use ArrayList, we do searching before index/delete and access through index, 
but BTree is in-order and we don't access through index. 

I personally think this approach is a bit complex. Do you have further 
suggestion about it? Nicholas.

On the other hand, back to the btree, Jing, I see you are not in favor of btree 
extend Node, is it possible to make some change and then you can accept? Any 
suggestions?

Appreciate you two to spend time on the discussion, thanks a lot!

> Support large directories efficiently using B-Tree
> --------------------------------------------------
>
>                 Key: HDFS-9053
>                 URL: https://issues.apache.org/jira/browse/HDFS-9053
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>            Priority: Critical
>         Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch, 
> HDFS-9053.004.patch, HDFS-9053.005.patch, HDFS-9053.006.patch, 
> HDFS-9053.007.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete, the time complexity is 
> O\(n), (the search is O(log n), but insertion/deleting causes re-allocations 
> and copies of arrays), for large directory, the operations are expensive.  If 
> the children grow to 1M size, the ArrayList will resize to > 1M capacity, so 
> need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) 
> continuous heap memory, it easily causes full GC in HDFS cluster where 
> namenode heap memory is already highly used.  I recap the 3 main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to