[ 
https://issues.apache.org/jira/browse/HDFS-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229457#comment-14229457
 ] 

Vinayakumar B commented on HDFS-7454:
-------------------------------------

bq. I actually think there is value in both optimizations. For Vinay's use case 
of a default ACL on a popular directory, it's more realistic to think of a 
larger ACL entry list than 3. A default ACL can never be fewer than 5 entries, 
so every directory will get a full copy of that + the access ACL entries. The 
files will only get the access ACL entries, so a number of 2-3 makes sense 
there. We could say 5 across both files and directories for a rough cut, which 
would put the earlier example at 6 GB. De-duplication could effectively reduce 
this to just 2 distinct AclFeature instances: 1 for all directories and 1 for 
all files, so the memory usage would be almost unnoticeable.
I feel this is some kind of common usecase for ACLs, where a toplevel directory 
is having default entries given by admin, which would be copied to its children 
later.
bq. I would suggest making the move to an int representation though rather than 
de-duplicating the individual entries. De-duplication is really valuable at the 
level of the whole AclFeature instance. The 2 optimizations don't necessarily 
need to be coupled to one another. They could be done in 2 different patches.
Yes, this is what I also feel. Yes sure, if necessary could be done in 
different patches.
bq. Vinay, based on your observed usage pattern, what do you think is the best 
option for proceeding with these 2 possible optimization paths? Ideally, we'd 
drive the choice from a real-world use case.
As I said, this looks like common usecase, and I feel better to implement. 
Others opinion also appreciated.

bq. I fear I may have caused more confusion than help by posting an initial 
patch using the Guava interner. It turns out that a full interning 
implementation really isn't necessary, because we can trust that all ACL 
modification operations are executing under the namesystem write lock. All we 
really need is some logic over a set data structure to check for existence of 
an identical prior AclFeature and reuse it. It's a much simpler code change 
than what my initial patch hinted at.
Yes, sure. I will post this in next patch.

> Implement Global ACL Set for memory optimization in NameNode
> ------------------------------------------------------------
>
>                 Key: HDFS-7454
>                 URL: https://issues.apache.org/jira/browse/HDFS-7454
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>         Attachments: HDFS-7454-001.patch
>
>
> HDFS-5620 indicated a GlobalAclSet containing unique {{AclFeature}} can be 
> de-duplicated to save the memory in NameNode. However it was not implemented 
> at that time.
> This Jira re-proposes same implementation, along with de-duplication of 
> unique {{AclEntry}} across all ACLs.
> One simple usecase is:
> A mapreduce user's home directory with the set of default ACLs, under which 
> lot of other files/directories could be created when jobs is run. Here all 
> the default ACLs of parent directory will be duplicated till the explicit 
> delete of those ACLs. With de-duplication,only one object will be in memory 
> for the same Entry across all ACLs of all files/directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to