Re: [lxc-users] Experience with large number of LXC/LXD containers

Benoit GEORGELIN - Association Web4all Tue, 04 Apr 2017 10:25:05 -0700

----- Mail original -----
> De: "Benoit GEORGELIN, web4all" <benoit.george...@web4all.fr>
> À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
> Envoyé: Mardi 28 Mars 2017 11:20:48
> Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers


> ----- Mail original -----
> > De: "David Favor" <da...@davidfavor.com>
> > À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
> > Envoyé: Lundi 27 Mars 2017 12:55:09
> > Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers

> > Serge E. Hallyn wrote:
> >> On Tue, Mar 14, 2017 at 02:29:01AM +0100, Benoit GEORGELIN – Association 
> >> Web4all
> >> wrote:
> >>> ----- Mail original -----
> >>>> De: “Simos Xenitellis” <simos.li...@googlemail.com> À: “lxc-users”
> >>>> <lxc-users@lists.linuxcontainers.org> Envoyé: Lundi 13 Mars 2017 20:22:03
> >>>> Objet: Re: [lxc-users] Experience with large number of LXC/LXD 
> >>>> containers On
> >>>> Sun, Mar 12, 2017 at 11:28 PM, Benoit GEORGELIN – Association Web4all
> >>>> <benoit.george...@web4all.fr> wrote:
> >>>>> Hi lxc-users , I would like to know if you have any experience with a 
> >>>>> large
> >>>>> number of LXC/LXD containers ? In term of performance, stability and 
> >>>>> limitation
> >>>>> . I'm wondering for exemple, if having 100 containers behave the same 
> >>>>> of having
> >>>>> 1.000 or 10.000 with the same configuration to avoid to talk about 
> >>>>> container
> >>>>> usage. I have been looking around for a couple of days to found any 
> >>>>> user/admin
> >>>>> feedback experience but i'm not able to find large deployments Is there 
> >>>>> any
> >>>>> ressources limits or any maximum number that can be deployed on the 
> >>>>> same node ?
> >>>>> Beside physical performance of the node, is there any specific behavior 
> >>>>> that a
> >>>>> large number of LXC/LXD containers can experience ? I'm not aware of 
> >>>>> any test
> >>>>> or limits that can occurs beside number of process. But I'm sure from 
> >>>>> LXC/LXD
> >>>>> side it might have some technical contraints ? Maybe on namespace 
> >>>>> availability
> >>>>> , or any other technical layer used by LXC/LXD I will be interested to 
> >>>>> here
> >>>>> from your experience or if you have any links/books/story about this 
> >>>>> large
> >>>>> deployments
> >>>> This would be interesting to hear if someone can talk publicly about 
> >>>> their large
> >>>> deployment. In any case, it should be possible to create, for example, 
> >>>> 1000 web
> >>>> servers and then try to access each one and check any issues regarding 
> >>>> the
> >>>> response time. Another test would be to install 1000 Wordpress 
> >>>> installations
> >>>> and check again for the response time and resource usage. Such scripts to
> >>>> create this massive number of containers would also be helpful to 
> >>>> replicate any
> >>>> issues in order to solve them. Simos
> > Been reading this + here's a bit of info.

> > I've been running LXC since early deployment + now LXD.

> > There are a few big performance killers related to WordPress. If you keep 
> > these
> > issues in mind, you'll be good.

> > 1) I run 100s of sites across many containers on many machines.
> > My business is private, high speed hosting, so I eat from my efforts.
> > No theory here.
> > I target WordPress site speed at 3000+ reqs/second, measured locally
> > using ab (ApacheBench). This is a crude tool + sufficient, as I issue
> > 1,000,000 simultaneous 5 thread connections against a server for 30 seconds.
> > ab -k -t 30 -n 10000000 -c 5 $URL
> > This will crash most machines, unless they're tuned well.

> > 2) Memory + CPU. The big killer of performance anywhere is swap thrash. If 
> > top
> > shows swapping for more than a few seconds, likely your system is heading
> > toward a crash.
> > Fix: I tend to deploy OVH machines with 128G of memory, as this is enough
> > memory to handle huge spikes of memory usage across many sites, during
> > traffic spikes... then recover...
> > For example, running 100s of sites across many LXD containers, I've had
> > machines sustain 250,000+ reqs/hour every day for months.
> > At these traffic levels, <1 core used sustained + 50%ish memory use.
> > Sites still show 3000+ reqs/sec using ab test above.

> > 3) Database: I run MariaDB rather than MySQL as it's smokin' fast.
> > I also relocate /tmp to tmpfs, so temp file i/o runs at memory speed,
> > rather than disk speed.
> > This ensures all MariaDB temp select set files (for complex selects)
> > generate + access at memory speed.
> > Also PHP session /tmp files run at memory speed.
> > This is important to me, as many of my clients run large membership
> > sites. Many are >40K members. This sites performance would circle
> > the drain if /tmp was on disk.

> > 4) Disk Thrash: Becomes the killer as traffic increases.

> > 5) Apache Logging: For several clients I'm currently retuning my Apache 
> > logging
> > to skip logging of successful serves of - images, css, js, fonts. I'll still
> > long non-200s, as these need to be debugged.
> > This can make a huge difference if memory pressure/use forces disk writes to
> > actually go to disk, rather than kernel filesystem i/o buffers.
> > Once memory pressure forces physical disk writes, disk i/o starves Apache 
> > from
> > quickly serving uncached content. Very ugly.
> > Right now I'm doing extensive filesystem testing, to reduce disk thrash 
> > during
> > traffic spikes + related memory pressure.

> > 6) Net Connection: If you're running 1000s of containers, best also check
> > adapter
> > saturation. I use 10Gig adapters + even at extreme traffic levels, they 
> > barely
> > reach 10% saturation.
> > This means 10Gig adapters are a must for me, as 10% is 1Gig, so using 1Gig
> > adapters, site speed would begin to throttle, based on adapter saturation,
> > which would be a bear to debug.

> > 7) Apache: I've taken setting up Apache to kill off processes, after 
> > anywhere
> > from 10K to 100K requests served. This ensures the kernel can garbage 
> > collect
> > (resource reclamation) which also helps escape swapping.
> > If you have 100,000s+ Apache processes running, with no kill off, then
> > eventually
> > they can potentially eat up a massive amount of memory, which takes a long 
> > time
> > to reclaim, depending on other MPM config settings.

> > So… General rule of thumb. Tune your entire WAMPL stack to run out of 
> > memory:
> > WAMPL - WordPress running on Apache + PHP + MariaDB + Linux

> > If your sites run at memory speed, makes no real difference how many 
> > containers
> > you run. Possibly context switching might come into play if many of the 
> > sites
> > running were high traffic sites.

> > If problems occur, just look at your Apache logs across all containers. 
> > Move the
> > site with highest traffic to another physical machine.

> > Or, if top shows swapping, add more memory.

> Hi David,
> interesting feedback, it's good to know about the details you gave 
> (memory/swap)
> Happy hosting ;)

By any chance, if you were in Montreal today, available for an event about 
security and LXD large deployment, I missed @stgraber tweet about it ( 
https://twitter.com/stgraber/status/849106252764520453 ) . 
That would be nice to share what was the large LXD deployment about :) 
Thanks!
_______________________________________________
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Experience with large number of LXC/LXD containers

Reply via email to