This is very helpful, thanks, this makes sense!

If services are layer 4 though, what does service.spec.sessionAffinity do?

If I'm understanding you, NGinx and HAProxy become useful things inside the cluster to provide layer 7 LB, whereas otherwise a more application/pod specific perspective (including client IP http headers) will be lost with the layer 4 services provided by Kubernetes? If so, I guess putting HAProxy or NGinx outside of the cluster would be somewhat limited since traffic would still need to pass through the layer 4 services?

Rodrigo Campos <mailto:rodrig...@gmail.com>
May 15, 2017 at 8:08 PM

On Sunday, May 14, 2017, Joe Auty <joea...@gmail.com <mailto:joea...@gmail.com>> wrote:

    Sorry for such a vague subject, but I think I need some help
    breaking things down here.

    I think I understand how the Google layer 7 LBs work (this diagram
    helped me:
    
https://storage.googleapis.com/static.ianlewis.org/prod/img/750/gcp-lb-objects2.png
    
<https://storage.googleapis.com/static.ianlewis.org/prod/img/750/gcp-lb-objects2.png>)
    , I understand NGinx and HAProxy LBs independently, and I believe
    I also understand the concepts of NodePort, Ingress controllers,
    services, etc.

    What I don't understand is why when I research things like
    socket.io <http://socket.io> architectures in Kubernetes (for
    example), or features like IP whitelisting, session affinity, etc.
    I see people putting NGinx or HAProxy into their clusters. It is
    hard for me to keep straight all of the different levels of load
    balancing and their controls:

      * Google backend services (i.e. Google LB)
      * Kubernetes service LB
      * HAProxy/NGinx


    The rationale for HAProxy and NGinx seems to involve compensating
    for missing features and/or bugs (kube-proxy, etc.) and it is hard
to keep straight what is a reality today and what the best path is?

Service is layer 4, so no session affinity with services. Except in some cases annotations (just some keywords on the service yaml) can be used. For example, rudimentary support was added with annotations, for some L7 features, using service load balancer in AWS using annotations.

But as services are supposed to be L4, it's not really the right layer to add some layer 7 specific things, that are specific for HTTP.

So, then, ingress come in. Ingress is layer 7, so it knows about these stuff (or can know :)).

So this is the chronological evolution of things, that might shed some light to understand why things are like this now.


    Google's LBs support session affinity, and there are session
    affinity Kubernetes service settings, so for starters, when and
    why is NGinx or HAProxy necessary, and are there outstanding
    issues with tracking source IPs and setting/respecting proper headers?


When you use a service type nodeport, it works like this: some port X is opened on all the hosts. When a packet arrives to that port, it is routed to the appropriate pods in a round robin fashion.

When you create a service type load balancer, you want to balance the load between pods and not between instances. As there might be more than one pod in one instance, or more instances than pods. So, how do you do this?

If the load balancer doesn't know about pods but only knows about instances, then you can't give it the task to load balance between pods. And, also, if you want to give it the responsibility of load balancing, then it should route to the subset of instances running the pods. And this might change often.

So, it is handled like this: the service types build one over the other, so a load balancer is also a nodeport. And the load balancer backends are configured as ALL the kubernetes nodes using this nodeport port. Then, when a packet arrives to the LB, some instance is chosen for that packet and then, on the node, kube-proxy (that runs in all nodes) makes the trick to forward it to the node running the pod in a balanced fashion. As all services have different nodePort, it's easy for kube-proxy to know to which pods this packet is for: just looking at the port the packet arrived.

But that has a price: the L4 connection as seen from the pod POV might come from a node in the cluster (that is just kube-proxy doing the "redirection") and the real source IP is lost.

In some protocols, like HTTP, there is a header that can be set and is just used and the client IP seems like the real client IP. But for some other protocols, this can be a problem.

There is a proposal (and implementation I think) to keep the source IP with services (I think that is basically avoiding the packet to arrive to a node not running the pod in the first place, to avoid the redirection, but not sure). And I think is only with Google load balancer for now.

Does this clarify something?

Sorry if I'm not clear, I'm not good at expressing myself sometimes. So, please, let me know if I wasn't clear :)


    I'm happy to get into what sort of features I need if this will
    help steer the discussion, but at this point I'm thinking maybe it
    is best to start at a more basic level where you treat me like I'm
    6 years old :)


Quite advanced questions for a 6 year old :)
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com <mailto:kubernetes-users+unsubscr...@googlegroups.com>. To post to this group, send email to kubernetes-users@googlegroups.com <mailto:kubernetes-users@googlegroups.com>.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.
Joe Auty <mailto:joea...@gmail.com>
May 14, 2017 at 1:28 PM
Sorry for such a vague subject, but I think I need some help breaking things down here.

I think I understand how the Google layer 7 LBs work (this diagram helped me: https://storage.googleapis.com/static.ianlewis.org/prod/img/750/gcp-lb-objects2.png) , I understand NGinx and HAProxy LBs independently, and I believe I also understand the concepts of NodePort, Ingress controllers, services, etc.

What I don't understand is why when I research things like socket.io architectures in Kubernetes (for example), or features like IP whitelisting, session affinity, etc. I see people putting NGinx or HAProxy into their clusters. It is hard for me to keep straight all of the different levels of load balancing and their controls:

  * Google backend services (i.e. Google LB)
  * Kubernetes service LB
  * HAProxy/NGinx


The rationale for HAProxy and NGinx seems to involve compensating for missing features and/or bugs (kube-proxy, etc.) and it is hard to keep straight what is a reality today and what the best path is?

Google's LBs support session affinity, and there are session affinity Kubernetes service settings, so for starters, when and why is NGinx or HAProxy necessary, and are there outstanding issues with tracking source IPs and setting/respecting proper headers?

I'm happy to get into what sort of features I need if this will help steer the discussion, but at this point I'm thinking maybe it is best to start at a more basic level where you treat me like I'm 6 years old :)

Thanks in advance!
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com <mailto:kubernetes-users+unsubscr...@googlegroups.com>. To post to this group, send email to kubernetes-users@googlegroups.com <mailto:kubernetes-users@googlegroups.com>.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes 
user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to