Xiang Li commented on HBASE-19917:

Thanks for your comment [~yuzhih...@gmail.com]!
{{filterServers()}} is only called in 
{{RSGroupBasedLoadBalancer#filterServers()}}, as follow:
return filterServers(RSGroupInfo.getServers(), onlineServers);
{{RSGroupInfo#getServers()}} returns servers, a SortedSet. It is a TreeSet 
actually, built by its constructor.

Given a TreeSet, there are 2 ways: (Let's say when calling {{filterServers()}}, 
size of servers is n and size of onlineServers is m)
# Keep using TreeSet. Time complexity is O(m * logn). Because 
TreeSet#contains() is logn and we loop for m.
# Turn TreeSet into HashSet, to pursue O(1) for contains(). Time complexity is 
O(m + n), as the following 2 steps are included:
## Construct a HashSet from a TreeSet. It is O(n) for time complexity (if I get 
it correctly) as it needs to iterate the TreeSet
## Calculate the union of severs and onlineServers. The time complexity is m * 
I think #1 is good enough, although it is worse than #2 which is linear. What 
is your opinion?

bq. If possible, we should change those to using HashSet.
In RSGroupInfo, servers as well as tables is TreeSet. According to the 
// Keep servers in a sorted set so has an expected ordering when displayed.
private final SortedSet<Address> servers;
// Keep tables sorted too.
private final SortedSet<TableName> tables;
TreeSet is only used for display purpose. I am checking if HashSet could be 
used to replace TreeSet throughout the calling chain.

> Improve RSGroupBasedLoadBalancer#filterServers() to be more efficient
> ---------------------------------------------------------------------
>                 Key: HBASE-19917
>                 URL: https://issues.apache.org/jira/browse/HBASE-19917
>             Project: HBase
>          Issue Type: Improvement
>          Components: rsgroup
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Minor
>         Attachments: HBASE-19917.master.000.patch
> {code:title=hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java|borderStyle=solid}
> private List<ServerName> filterServers(Collection<Address> servers,
>     Collection<ServerName> onlineServers) {
>   ArrayList<ServerName> finalList = new ArrayList<ServerName>();
>   for (Address server : servers) {
>     for(ServerName curr: onlineServers) {
>       if(curr.getAddress().equals(server)) {
>         finalList.add(curr);
>       }
>     }
>   }
>   return finalList;
> }
> {code}
> filterServers is to return the union of servers and onlineServers. The 
> current implementation has time complexity as O(m * n) (2 loops), could be in 
> O(m + n) if HashSet is used. The trade-off is space complexity is increased.
> Another point which could be improved: filterServers() is only called in 
> filterOfflineServers(). filterOfflineServers calls filterServers(Set, List). 
> The current filterServers(Collection, Collection) seems could be improved.

This message was sent by Atlassian JIRA

Reply via email to