[ 
https://issues.apache.org/jira/browse/HDFS-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229454#comment-14229454
 ] 

Chris Nauroth commented on HDFS-7454:
-------------------------------------

I actually think there is value in both optimizations.  For Vinay's use case of 
a default ACL on a popular directory, it's more realistic to think of a larger 
ACL entry list than 3.  A default ACL can never be fewer than 5 entries, so 
every directory will get a full copy of that + the access ACL entries.  The 
files will only get the access ACL entries, so a number of 2-3 makes sense 
there.  We could say 5 across both files and directories for a rough cut, which 
would put the earlier example at 6 GB.  De-duplication could effectively reduce 
this to just 2 distinct {{AclFeature}} instances: 1 for all directories and 1 
for all files, so the memory usage would be almost unnoticeable.

I would suggest making the move to an {{int}} representation though rather than 
de-duplicating the individual entries.  De-duplication is really valuable at 
the level of the whole {{AclFeature}} instance.  The 2 optimizations don't 
necessarily need to be coupled to one another.  They could be done in 2 
different patches.

Vinay, based on your observed usage pattern, what do you think is the best 
option for proceeding with these 2 possible optimization paths?  Ideally, we'd 
drive the choice from a real-world use case.

bq. With these number, the scheme seems a pretty good thing to have before we 
really thinking of getting into the mud of implementing an interner.

I fear I may have caused more confusion than help by posting an initial patch 
using the Guava interner.  It turns out that a full interning implementation 
really isn't necessary, because we can trust that all ACL modification 
operations are executing under the namesystem write lock.  All we really need 
is some logic over a set data structure to check for existence of an identical 
prior {{AclFeature}} and reuse it.  It's a much simpler code change than what 
my initial patch hinted at.

> Implement Global ACL Set for memory optimization in NameNode
> ------------------------------------------------------------
>
>                 Key: HDFS-7454
>                 URL: https://issues.apache.org/jira/browse/HDFS-7454
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>         Attachments: HDFS-7454-001.patch
>
>
> HDFS-5620 indicated a GlobalAclSet containing unique {{AclFeature}} can be 
> de-duplicated to save the memory in NameNode. However it was not implemented 
> at that time.
> This Jira re-proposes same implementation, along with de-duplication of 
> unique {{AclEntry}} across all ACLs.
> One simple usecase is:
> A mapreduce user's home directory with the set of default ACLs, under which 
> lot of other files/directories could be created when jobs is run. Here all 
> the default ACLs of parent directory will be duplicated till the explicit 
> delete of those ACLs. With de-duplication,only one object will be in memory 
> for the same Entry across all ACLs of all files/directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to