[ 
https://issues.apache.org/jira/browse/KUDU-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956582#comment-16956582
 ] 

Alexey Serbin commented on KUDU-2962:
-------------------------------------

I think the best approach would be something like Method #1 since it doesn't 
involve master, and that means more actual information and less surprises.  
That's because the source of truth for Raft consensus state is the leader 
replica of the tablet, and master gets updates only with heartbeats it receives 
from tablet servers.  Also, that might be better in test scenarios where master 
might be temporarily down.  

As you described, the implementation is pretty straightforward:
# get the information on tablet replicas using {{GetConsensusState()}} and form 
list of replica UUIDs from {{committed_config().peers()}}
# build intersection between the set of supplised tserver UUIDs and peers 
(e.g., using {{std::set_intersection()}} or alike)
# make sure its size is {{peers.size()}}
# remove the element that represents the tablet server that hosts the leader 
replica
# return the result

Step 3 might be optional if there are use-cases where it's necessary to get 
list of replicas from set of tservers which don't host the whole set of tablet 
peers.  I hope that's not the case and step 3 will be mandatory as a sanity 
check on the input parameters. 

> Fix kudu::itest::FindTabletFollowers() test utility function
> ------------------------------------------------------------
>
>                 Key: KUDU-2962
>                 URL: https://issues.apache.org/jira/browse/KUDU-2962
>             Project: Kudu
>          Issue Type: Improvement
>          Components: test
>            Reporter: Alexey Serbin
>            Assignee: Bankim Bhavsar
>            Priority: Minor
>              Labels: newbie
>
> The {{kudu::itest::FindTabletFollowers()}} function is unsafe: it uses 
> {{kudu::itest::FindTabletLeader()}} to generate the result as a complement to 
> tablet servers hosting the leader replica, but it doesn't sanitize the set of 
> tablet servers to make sure it contains only tablet servers hosting replicas 
> of the specified tablet.
> For example, if you have a cluster with 10 tablet servers, and a tablet with 
> 3 tablet replicas, passing the map for all tablet servers in the 10-node 
> cluster would result in {{FindTabletFollowers()}} reporting 9 followers.  
> Whoops!
> It's necessary to either fix the implementation of this utility function to 
> sanitize its first argument, or simply get rid of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to