Re: [google-containers] Any success stories deploying Kubernetes into AWS or GCE on a big scale: 1000 instances or more?

'Filip Grzadkowski' via Containers at Google Fri, 22 Jul 2016 03:01:19 -0700

In order to start 1000 node clusters in GCE you'll have to request for more
quota for at least the following resources:
* Cores in a given region
* Disks
* Firewall rules
* Routes
Probably I'm missing something :)


We're running number of load tests on GCE where we start 1000 node clusters
and it works without problems.

--
Filip

On Fri, Jul 22, 2016 at 7:35 AM, Juho Mäkinen <[email protected]>
wrote:

> So as others have said, this is not a problem on GCE, shouldn't be a
>> problem with some of the alternative network providers on AWS, and is
>> really only an issue with the default VPC provider on AWS.
>>
>
> Are you sure? GCE says this "The overall route quota for a project is
> currently 100, and each subnet created uses one route against the quota. "
> (at https://cloud.google.com/compute/docs/networking). Also the Flannel
> documentation says the same for the gce networking module: "Note:
> Currently, GCE limits the number of routes for every project to 100." (at
> https://github.com/coreos/flannel/).
>
> I am personally looking at a number of options that may be a half-way
>> house between VPC routing configuration (where AWS imposes the 50 node
>> limit) and the full network providers.  For example, setting up a GRE mesh
>> is simple and performs relatively well (though doesn't work on GCE I
>> believe).
>>
>
> Doesn't a full GRE mesh mean that each node would have to have a GRE
> tunnel to every other node? So a 1000 node cluster would have a million
> tunnels. That doesn't sound very scalable to me :( That being said, I don't
> know why GRE tunnel wouldn't work on GCE.
>
> I haven't yet studied the IPIP module enough to understand how that would
> behave on a big mesh like that - it might be the same thing.
>
> I also did some experiments and found that IPSEC over UDP performed
>> surprisingly well.  IPSEC over UDP is interesting because it's potentially
>> universal: I don't know of any network that doesn't support UDP, and
>> encryption means you aren't assuming a secure network either.  I'd love to
>> talk to you about which networking options make the most sense to someone
>> at your scale, Garo.  And that goes for anyone else that might have more
>> insight into which networking options make most sense!
>>
>
>
>> Also, I investigated further, and AWS only supports layer 2 in a single
>> VPC Subnet.  So on AWS we can have Layer 2 networking, or HA clusters, but
>> not both at the same time.
>>
>
> Yep. To me it seems that I will anyway need to use encapsulation: Flannel
> has its own udp encapsulation, Calico uses the IPIP kernel module and Weave
> has its own encapsulation. If I would restrict my setup to just a single
> VPC in AWS my options would be much more open as I wouldn't need to use
> tunnels, but the reality is that I need multiple availability zones, so
> pretty much every documentation, benchmark and tutorial which doesn't use
> encapsulation is useless to me and that's what making this so hard. For
> example Weave has this nice blog post about "Container networking with no
> overlay on AWS VPC" (
> https://www.weave.works/container-networking-no-overlay-aws-vpc/) which
> promises blazingly fast speeds, but I need to spot the fine print that it
> uses the VPC routing table, thus limiting my machines to 50.
>
> After talking with a few Calico developers they said that it might be easy
> enough to implement support that Calico would use IPIP only across VPC
> subnets and use L2 connectivity within a single availability zone, as that
> would use the most optimised path between hosts in all cases. Also Calico
> has nice implementation of networking access lists which are compatible
> with the new Kubernetes 1.3 features, thus allowing ACLs between pods.
>
> Flannel with udp encapsulation is an interesting choice as well. I haven't
> yet found out any good benchmarks on the udp encapsulation against Calico
> IPIP, but I need to do my own benchmarks anyway. Also I haven't found yet
> good explanation how flannel behaves if the etcd cluster goes down (we have
> had that issue a couple of time in our production, though things has gotten
> better and better as etcd has matured over past couple years)
>
>
> Relatedly (if you're not aware of it) Kubernetes added the first pieces of
>> federation in 1.3, and you might consider whether you actually want a few
>> clusters of 256 nodes each (for example).  Advantages are that you could
>> span datacenters / providers, and that you're better able to tolerate
>> control plane failures.
>>
>
> This is definitively interesting. I'm planning going multi datacenter (so
> I would need to have also multi datacenter network connectivity at some
> level), but I don't want to extend my control plane across regions.
>
>
>>
>> Justin
>>
>> ( @justinsb on the k8s slack )
>>
>> On Thursday, July 21, 2016 at 12:46:42 PM UTC-4, Tim Hockin wrote:
>>>
>>> Flannel should work on AWS at that scale.
>>>
>>> Justin (Mr. k8s on AWS) mentioned he was exploring an alternate
>>> solution to the AWS static routes.  VPC has an L2 domain doesn't it?
>>> If so, something like Calico should work (no overlay).
>>>
>>> On Thu, Jul 21, 2016 at 6:53 AM, Rodrigo Campos <[email protected]>
>>> wrote:
>>> > I think on gce or gke, you can do this easily. It doesn't use flannel,
>>> etc
>>> > (you can, but is not the default). It uses the Google equivalent of
>>> aws vpc,
>>> > so I guess it doesn't have those limits aws has. In fact, a 1000/2000
>>> vms
>>> > cluster is used for several blog postson gke and it works just fine.
>>> >
>>> > The aws vpc has the limit, but I'm sure flannel will be an issue. The
>>> coreos
>>> > guys use that, so I'dbe really surprised if it was an issue on a 1000
>>> vms
>>> > cluster.
>>> >
>>> > So, on gce or gke it should just work. And in aws, it probably should
>>> just
>>> > work if you use coreos, at least. And you can easily install coreos
>>> with
>>> > kube-aws, a tool coreos created.
>>> >
>>> > On Thursday, July 21, 2016, Juho Mäkinen <[email protected]> wrote:
>>> >>
>>> >> I'm evaluating Kubernetes and I'm struggling on finding out any good
>>> >> examples and solutions how Kubernetes can be deployed into AWS so
>>> that the
>>> >> cluster has at least 1000 virtual machines.
>>> >>
>>> >> I have been reading on pretty much all of the suggested networking
>>> layers:
>>> >> flanner, weave, calico and a few others, but they all have some
>>> limitations
>>> >> which I'm worried about: Either their performance is sub-optimal,
>>> they
>>> >> suggest using AWS RouteTables (limits the instance count to 50-100),
>>> or they
>>> >> have some other limitations which feels are too restrictive when I'm
>>> aiming
>>> >> for over 1000 virtual machines.
>>> >>
>>> >> I'd like to hear some success stories from other users how they have
>>> built
>>> >> big Kubernetes installations.
>>> >>
>>> >>  - Garo
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> Groups
>>> >> "Containers at Google" group.
>>> >> To unsubscribe from this group and stop receiving emails from it,
>>> send an
>>> >> email to [email protected].
>>> >> To post to this group, send email to [email protected].
>>> >> Visit this group at https://groups.google.com/group/google-containers.
>>>
>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> Groups
>>> > "Containers at Google" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> an
>>> > email to [email protected].
>>> > To post to this group, send email to [email protected].
>>> > Visit this group at https://groups.google.com/group/google-containers.
>>>
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Containers at Google" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/google-containers/5EqfWtQKTaA/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/google-containers.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Containers at Google" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/google-containers.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Containers at Google" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-containers.
For more options, visit https://groups.google.com/d/optout.

Re: [google-containers] Any success stories deploying Kubernetes into AWS or GCE on a big scale: 1000 instances or more?

Reply via email to