Re: WAN replication issue in cloud native environments

Charlie Black Wed, 04 Dec 2019 09:12:58 -0800

Alberto,

Something else to think about SNI based routing.   I believe Mario might be
working on adding SNI to Geode - he at least had a proposal that he
e-mailed out.


Basics are the destination host is in the SNI field and the proxy can
inspect and route the request to the right service instance.     Plus we
have the option to not terminate the SSL at the proxy.

Full disclosure - I haven't tried out SNI based routing myself and it is
something that I thought could work as I was reading about it.   From the
whiteboard I have done I think this will do ingress and egress just fine.
Potentially easier then port mapping and `hostname for clients` playing
around.

Just something to think about.

Charlie


On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
<alberto.bustamante.re...@est.tech> wrote:

> Hi Jacob,
>
> Yes,we are using LoadBalancer service type. But note the problem is not
> the transport layer but on Geode as GW senders are complaining
> “sender-2-parallel : Could not connect due to: There are no active
> servers.” when one of the servers in the receiving cluster is killed.
>
> So, there is still one server alive in the receiving cluster but GW sender
> does not know it and the locator is not able to inform about its existence.
> Looking at the code it seems internal data structures (maps) holding the
> profiles use object whose equality check relies only on hostname and port.
> This makes it impossible to differentiate servers when the same
> “hostname-for-senders” and port are used. When the killed server comes back
> up, the locator profiles are updated (internal map back to size()=1
> although 2+ servers are there) and GW senders happily reconnect.
>
> The solution with the Geode as-is would be to expose each GW receiver on a
> different port outside of k8s cluster, this includes creating N Kubernetes
> services for N GW receivers in addition to updating the service mesh
> configuration (if it is used, firewalls etc…). Declarative nature of
> kubernetes means we must know the ports in advance hence start-port and
> end-port when creating each GW receiver must be equal and we should have
> some well-known
> algorithm when creating GW receivers across servers. For example: server-0
> port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW
> receivers must be wired individually and we must turn off Geode’s random
> port allocation.
>
> But we are exploring the possibility for Geode to handle this cloud-native
> configuration a bit better. Locators should be capable of holding GW
> receiver information although they are hidden behind same hostname and port.
> This is a code change in Geode and we would like to have community opinion
> on it.
>
> Some obvious impacts with the legacy behavior would be when locator picks
> a server on behalf of the client (GW sender in this case) it does so based
>  on the server load. When sender connects and considering all servers are
> using same VIP:PORT it is load balancer that will decide where the
> connection will end up, but likely not on the one selected by locator. So
> here we ignore the locator instructions. Since GW senders normally do not
> create huge number of connections this probably shall not unbalance cluster
> too much. But this is an impact worth considering. Custom load metrics
> would also be ignored by GW senders. Opinions?
>
> Additional impact that comes to mind is GW sender load-balance command and
> how it’s execution would be affected.
>
> Thanks!
>
> Alberto B.
>
> ________________________________
> De: Jacob Barrett <jbarr...@pivotal.io>
> Enviado: viernes, 29 de noviembre de 2019 13:06
> Para: dev@geode.apache.org <dev@geode.apache.org>
> Asunto: Re: WAN replication issue in cloud native environments
>
>
>
> > On Nov 29, 2019, at 3:14 AM, Alberto Bustamante Reyes
> <alberto.bustamante.re...@est.tech> wrote:
> >
> > The reason for such a setup is deploying Geode cluster on a Kubernetes
> cluster where all GW receivers are reachable from the outside world on the
> same VIP and port.
>
> Are you using LoadBalancer Service type?
>
> > Other kinds of configuration (different hostname and/or different port
> for each GW receiver) are not cheap from OAM and resources perspective in
> cloud native environments and also limit some important use-cases (like
> scaling).
>
> If you could somehow configure host and port for sender (code modification
> required) would exposing each port through the LoadBalancer be too
> expensive too?
>
> > The problem experienced is that shutting down one server is stopping
> replication to this cluster until the server is up again. We suspect this
> is because Geode incorrectly assumes there are no more alive servers when
> just one of them is down (since they share hostname-for-senders and port).
>
> Sees like at the worst case when it tries to reconnect the LB should give
> it a live server and it think the single server is back up.
>
> -Jake
>
>

-- 
Charlie Black | cbl...@pivotal.io

Re: WAN replication issue in cloud native environments

Reply via email to