My initial guess without looking is that the client pool is sending a ping to each ServerLocation on only one of the available Connections. This logic should be changed to send to each unique member, since ServerLocation is not unique anymore.
-Jake > On Jan 27, 2020, at 8:55 AM, Alberto Bustamante Reyes > <alberto.bustamante.re...@est.tech> wrote: > > > Hi again, > > Status update: the simplification of the maps suggested by Jacob made useless > the new proposed class containing the ServerLocation and the member id. With > this refactoring, replication is working in the scenario we have been > discussing in this conversation. Thats great, and I think the code can be > merged into develop if there are no extra comments in the PR. > > But this does not mean we can say that Geode is able to work properly when > using gw receivers with the same ip + port. We have seen that when working > with this configuration, there is a problem with the pings sent from gw > senders (that acts as clients) to the gw receivers (servers). The pings are > reaching just one of the receivers, so the sender-receiver connection is > finally closed by the ClientHealthMonitor. > > Do you have any suggestion about how to handle this issue? My first idea was > to identify where the connection is created, to check if the sender could be > aware in some way there are more than one server to which the ping should be > sent, but Im not sure if it could be possible. Or if the alternative could be > to change the ClientHealthMonitor to be "clever" enough to not close > connections in this case. Any comment is welcome 🙂 > > Thanks, > > Alberto B. > > De: Jacob Barrett <jbarr...@pivotal.io> > Enviado: miércoles, 22 de enero de 2020 19:01 > Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech> > Cc: dev@geode.apache.org <dev@geode.apache.org>; Anilkumar Gingade > <aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io> > Asunto: Re: WAN replication issue in cloud native environments > > > >>> On Jan 22, 2020, at 9:51 AM, Alberto Bustamante Reyes >>> <alberto.bustamante.re...@est.tech> wrote: >>> >>> Thanks Naba & Jacob for your comments! >>> >>> >>> >>> @Naba: I have been implementing a solution as you suggested, and I think it >>> would be convenient if the client knows the memberId of the server it is >>> connected to. >>> >>> (current code is here: https://github.com/apache/geode/pull/4616 ) >>> >>> For example, in: >>> >>> LocatorLoadSnapshot::getReplacementServerForConnection(ServerLocation >>> currentServer, String group, Set<ServerLocation> excludedServers) >>> >>> In this method, client has sent the ServerLocation , but if that object >>> does not contain the memberId, I dont see how to guarantee that the >>> replacement that will be returned is not the same server the client is >>> currently connected. >>> Inside that method, this other method is called: >> >> >> Given that your setup is masquerading multiple members behind the same host >> and port (ServerLocation) it doesn’t matter. When the pool opens a new >> socket to the replacement server it will be to the shared hostname and port >> and the Kubenetes service at that host and port will just pick a backend >> host. In the solution we suggested we preserved that behavior since the k8s >> service can’t determine which backend member to route the connection to >> based on the member id. >> >> >> LocatorLoadSnapshot::isCurrentServerMostLoaded(currentServer, groupServers) >> >> where groupServers is a "Map<ServerLocationAndMemberId, LoadHolder>" object. >> If the keys of that map have the same host and port, they are only different >> on the memberId. But as you dont know it (you just have currentServer which >> contains host and port), you cannot get the correct LoadHolder value, so you >> cannot know if your server is the most loaded. > > Again, given your use case the behavior of this method is lost when a new > connection is establish by the pool through the shared hostname anyway. > >> @Jacob: I think the solution finally implies that client have to know the >> memberId, I think we could simplify the maps. > > The client isn’t keeping these load maps, the locator is, and the locator > knows all the member ids. The client end only needs to know the host/port > combination. In your example where the wan replication (a client to the > remote cluster) connects to the shared host/port service and get randomly > routed to one of the backend servers in that service. > > All of this locator balancing code is unnecessarily in this model where > something else is choosing the final destination. The goal of our proposed > changes was to recognize that all we need is to make sure the locator keeps > the shared ServerLocation alive in its responses to clients by tracking the > members associated and reducing that set to the set of unit ServerLocations. > In your case that will always reduce to 1 ServerLocation for N number of > members, as long as 1 member is still up. > > -Jake > >