Ok, here's where I'm at with this... I tried implementing the kernel options on one of the Graylog servers as a test but it made no appreciable difference. In fact shortly after the first reboot the VM froze with a locked CPU error. It hasn't done that since a subsequent reboot though. We're not running the PVSCSI adapter either.
After observing this, I revisited Mathieu's comment regarding too many CPU's. While I still see no contention issues for CPU resources, I started wondering if there was some SMP related issue with CentOS where the extra vCPU's just weren't providing enough extra to cater for the workload. I scaled all the nodes back to 8 vCPU's and added another four Graylog servers, so I now have 8 servers receiving the inputs. So far this is running a lot better than the four servers with 16 and 20 vCPU's. They still peak at 100% but this is not sustained, even after having an ElasticSearch issue (filling the disks again) that caused a backlog in the message journal overnight. Almost all the message backlog in the journals have been processed again and it's still working well so far, this is after 24 hours or so. I'll see how it runs over the weekend. Incidentally it seems I have inadvertently stumbled across a good number for the process buffer processors... it seems to work well at 2 less than the number of CPU's available to the server. Running with a buffer number of 6 with 8 vCPU's seems to work well. Of course I'm not sure if this is just in my particular environment or if it's a general thing. Cheers, Pete On Thursday, 14 May 2015 19:13:24 UTC+10, Pete GS wrote: > > Thanks very much Arie, I will check these tomorrow and report back. > > One thing I can confirm is the heap size is configured correctly. > > Cheers, Pete > > On 14 May 2015, at 05:35, Arie <[email protected]> wrote: > > Lets try some more options. > > I see you are running your stuf virtual. Then you can consider the > following for centos6 > > In your startup kernel config you can add the following options > (/etc/grub.conf) > > nohz=off (for high cpu intensive systems) > elevator=noop (disc scheduling is done by the virtual layer, so disable > that) > cgroup_disable=memory (possibly not used, it fees up some memory and > allocation) > > if you use the pvscsi device, add the following: > vmw_pvscsi.cmd_per_lun=254 > vmw_pvscsi.ring_pages=32 > > Check disk buffers on the virtual layer too. vmware kb 2053145 > see > http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2053145&sliceId=1&docTypeID=DT_KB_1_1&dialogID=621755330&stateId=1%200%20593866502 > > Optimize your disk for performance (up to 30%!!! yes): > > for the filesystems were graylog and or elastic is located add the > following to /etc/fstab > > example: > /dev/mapper/vg_nagios-lv_root / ext4 > defaults,noatime,nobarrier,data=writeback 1 1 > and if you want to be more safe: > /dev/mapper/vg_nagios-lv_root / ext4 defaults,noatime,nobarrier 1 1 > > is ES_HEAP_SIZE configured @ the correct place (I did that wrong at first) > it is in /etc/systconfig/elasticsearch > > > All these options together can improve system performance huge specially > when they are virtial. > > ps did you correctly cha > > ... -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
