[
https://issues.apache.org/jira/browse/HDFS-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229457#comment-14229457
]
Vinayakumar B commented on HDFS-7454:
-------------------------------------
bq. I actually think there is value in both optimizations. For Vinay's use case
of a default ACL on a popular directory, it's more realistic to think of a
larger ACL entry list than 3. A default ACL can never be fewer than 5 entries,
so every directory will get a full copy of that + the access ACL entries. The
files will only get the access ACL entries, so a number of 2-3 makes sense
there. We could say 5 across both files and directories for a rough cut, which
would put the earlier example at 6 GB. De-duplication could effectively reduce
this to just 2 distinct AclFeature instances: 1 for all directories and 1 for
all files, so the memory usage would be almost unnoticeable.
I feel this is some kind of common usecase for ACLs, where a toplevel directory
is having default entries given by admin, which would be copied to its children
later.
bq. I would suggest making the move to an int representation though rather than
de-duplicating the individual entries. De-duplication is really valuable at the
level of the whole AclFeature instance. The 2 optimizations don't necessarily
need to be coupled to one another. They could be done in 2 different patches.
Yes, this is what I also feel. Yes sure, if necessary could be done in
different patches.
bq. Vinay, based on your observed usage pattern, what do you think is the best
option for proceeding with these 2 possible optimization paths? Ideally, we'd
drive the choice from a real-world use case.
As I said, this looks like common usecase, and I feel better to implement.
Others opinion also appreciated.
bq. I fear I may have caused more confusion than help by posting an initial
patch using the Guava interner. It turns out that a full interning
implementation really isn't necessary, because we can trust that all ACL
modification operations are executing under the namesystem write lock. All we
really need is some logic over a set data structure to check for existence of
an identical prior AclFeature and reuse it. It's a much simpler code change
than what my initial patch hinted at.
Yes, sure. I will post this in next patch.
> Implement Global ACL Set for memory optimization in NameNode
> ------------------------------------------------------------
>
> Key: HDFS-7454
> URL: https://issues.apache.org/jira/browse/HDFS-7454
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Vinayakumar B
> Assignee: Vinayakumar B
> Attachments: HDFS-7454-001.patch
>
>
> HDFS-5620 indicated a GlobalAclSet containing unique {{AclFeature}} can be
> de-duplicated to save the memory in NameNode. However it was not implemented
> at that time.
> This Jira re-proposes same implementation, along with de-duplication of
> unique {{AclEntry}} across all ACLs.
> One simple usecase is:
> A mapreduce user's home directory with the set of default ACLs, under which
> lot of other files/directories could be created when jobs is run. Here all
> the default ACLs of parent directory will be duplicated till the explicit
> delete of those ACLs. With de-duplication,only one object will be in memory
> for the same Entry across all ACLs of all files/directories.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)