I'd support adding these parameters in some form in /etc/init.d/cloud-early-config. Agree that OOM killer is of no use.
On 9/4/13 11:15 AM, "Roeland Kuipers" <rkuip...@schubergphilis.com> wrote: >Hi Dev! > >We have experienced a serious customers outage due to the OOM killer on a >redundant routing vm pair member. Somehow the MASTER node ran Out of >Memory and the OOM killer decided to kill random processes causing >HAproxy to go down. But since keepalived was still running and >functioning, a failover never happened. >In our experience we rather panic on OOM instead of praying that the >OOM-killer will do the right thing while it in 99% percent of the cases >it just renders a machine useless. >If this RvR would have panicked and rebooted we would have had a nice >keepalived failure/failover without much impact on our customer. > >So we figured to configure the following sysctl options: > vm.panic_on_oom = 1 > kernel.panic_on_oops = 1 > kernel.panic = 10 > >So that a VM panics and reboots after 10 seconds so a router just comes >back in a happy state versus crippled by the OOM killer. > >But we hit a problem here with VPC routers as their configuration is not >persistent across reboots when they are rebooted outside cloudstack as >they are not configured (entirely) using kernel parameters >(/var/cache/cloud/cmdline). But only when started by Cloudstack. > >It would be nice to see that the VPC router config is persistent across >reboots even when rebooted outside cloudstack and using the same >mechanism as the other system vm's to make things more consistent and >reliable. > >What is your opinion on this? Otherwise will add it to our backlog to >contribute improvements in this area. > >See also: > >https://issues.apache.org/jira/browse/CLOUDSTACK-4605 >https://issues.apache.org/jira/browse/CLOUDSTACK-4606 >https://issues.apache.org/jira/browse/CLOUDSTACK-4607 > > >Thanks & Cheers, >Roeland Kuipers > > >