[
https://issues.apache.org/jira/browse/HADOOP-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272366#comment-17272366
]
Daryn Sharp commented on HADOOP-17079:
--------------------------------------
I seriously question the design decision to break an interface and then
redundantly modify every single group provider impl to return a Set with the
additional _implicit_ expectation that providers _must_ return a LinkedHashSet
(so primary group works). Why was it not sufficient to just modify the
GroupCacheLoader#load to convert the list to a set for the cache?
As evidenced by the JNI netgroup bug, I would not be surprised if this large &
seemingly unnecessary change has added more subtle bugs. I suggest reverting
all the group provider impl changes.
> Optimize UGI#getGroups by adding UGI#getGroupsSet
> -------------------------------------------------
>
> Key: HADOOP-17079
> URL: https://issues.apache.org/jira/browse/HADOOP-17079
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Xiaoyu Yao
> Assignee: Xiaoyu Yao
> Priority: Major
> Fix For: 3.4.0
>
> Attachments: HADOOP-17079.002.patch, HADOOP-17079.003.patch,
> HADOOP-17079.004.patch, HADOOP-17079.005.patch, HADOOP-17079.006.patch,
> HADOOP-17079.007.patch
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> UGI#getGroups has been optimized with HADOOP-13442 by avoiding the
> List->Set->List conversion. However the returned list is not optimized to
> contains lookup, especially the user's group membership list is huge
> (thousands+) . This ticket is opened to add a UGI#getGroupsSet and use
> Set#contains() instead of List#contains() to speed up large group look up
> while minimize List->Set conversions in Groups#getGroups() call.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]