Re: WAN replication issue in cloud native environments

Jacob Barrett Mon, 27 Jan 2020 09:03:27 -0800

My initial guess without looking is that the client pool is sending a ping to 
each ServerLocation on only one of the available Connections. This logic should 
be changed to send to each unique member, since ServerLocation is not unique 
anymore.


-Jake


> On Jan 27, 2020, at 8:55 AM, Alberto Bustamante Reyes 
> <alberto.bustamante.re...@est.tech> wrote:
> 
> 
> Hi again,
> 
> Status update: the simplification of the maps suggested by Jacob made useless 
> the new proposed class containing the ServerLocation and the member id. With 
> this refactoring, replication is working in the scenario we have been 
> discussing in this conversation. Thats great, and I think the code can be 
> merged into develop if there are no extra comments in the PR.
> 
> But this does not mean we can say that Geode is able to work properly when 
> using gw receivers with the same ip + port. We have seen that when working 
> with this configuration, there is a problem with the pings sent from gw 
> senders (that acts as clients) to the gw receivers (servers). The pings are 
> reaching just one of the receivers, so the sender-receiver connection is 
> finally closed by the ClientHealthMonitor.
> 
> Do you have any suggestion about how to handle this issue? My first idea was 
> to identify where the connection is created, to check if the sender could be 
> aware in some way there are more than one server to which the ping should be 
> sent, but Im not sure if it could be possible. Or if the alternative could be 
> to change the ClientHealthMonitor to be "clever" enough to not close 
> connections in this case. Any comment is welcome 🙂
> 
> Thanks,
> 
> Alberto B.
>   
> De: Jacob Barrett <jbarr...@pivotal.io>
> Enviado: miércoles, 22 de enero de 2020 19:01
> Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> Cc: dev@geode.apache.org <dev@geode.apache.org>; Anilkumar Gingade 
> <aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
> Asunto: Re: WAN replication issue in cloud native environments
>  
> 
> 
>>> On Jan 22, 2020, at 9:51 AM, Alberto Bustamante Reyes 
>>> <alberto.bustamante.re...@est.tech> wrote:
>>> 
>>> Thanks Naba & Jacob for your comments!
>>> 
>>> 
>>> 
>>> @Naba: I have been implementing a solution as you suggested, and I think it 
>>> would be convenient if the client knows the memberId of the server it is 
>>> connected to.
>>> 
>>> (current code is here: https://github.com/apache/geode/pull/4616 )
>>> 
>>> For example, in:
>>> 
>>> LocatorLoadSnapshot::getReplacementServerForConnection(ServerLocation 
>>> currentServer, String group, Set<ServerLocation> excludedServers)
>>> 
>>> In this method, client has sent the ServerLocation , but if that object 
>>> does not contain the memberId, I dont see how to guarantee that the 
>>> replacement that will be returned is not the same server the client is 
>>> currently connected.
>>> Inside that method, this other method is called:
>> 
>> 
>> Given that your setup is masquerading multiple members behind the same host 
>> and port (ServerLocation) it doesn’t matter. When the pool opens a new 
>> socket to the replacement server it will be to the shared hostname and port 
>> and the Kubenetes service at that host and port will just pick a backend 
>> host. In the solution we suggested we preserved that behavior since the k8s 
>> service can’t determine which backend member to route the connection to 
>> based on the member id.
>> 
>> 
>> LocatorLoadSnapshot::isCurrentServerMostLoaded(currentServer, groupServers)
>> 
>> where groupServers is a "Map<ServerLocationAndMemberId, LoadHolder>" object. 
>> If the keys of that map have the same host and port, they are only different 
>> on the memberId. But as you dont know it (you just have currentServer which 
>> contains host and port), you cannot get the correct LoadHolder value, so you 
>> cannot know if your server is the most loaded.
> 
> Again, given your use case the behavior of this method is lost when a new 
> connection is establish by the pool through the shared hostname anyway. 
> 
>> @Jacob: I think the solution finally implies that client have to know the 
>> memberId, I think we could simplify the maps.
> 
> The client isn’t keeping these load maps, the locator is, and the locator 
> knows all the member ids. The client end only needs to know the host/port 
> combination. In your example where the wan replication (a client to the 
> remote cluster) connects to the shared host/port service and get randomly 
> routed to one of the backend servers in that service.
> 
> All of this locator balancing code is unnecessarily in this model where 
> something else is choosing the final destination. The goal of our proposed 
> changes was to recognize that all we need is to make sure the locator keeps 
> the shared ServerLocation alive in its responses to clients by tracking the 
> members associated and reducing that set to the set of unit ServerLocations. 
> In your case that will always reduce to 1 ServerLocation for N number of 
> members, as long as 1 member is still up.
> 
> -Jake
> 
>

Re: WAN replication issue in cloud native environments

Reply via email to