Yes that's very true! A vCPU has definitely gotten closer to a physical CPU but there will always be overheads due to the hypervisor.
I've never encountered this many vCPU's and therefore these issues before so now I have another metric to add to my toolbox for future reference. Thanks again Mathieu! Cheers, Pete On Tuesday, 26 May 2015 04:59:50 UTC+10, Mathieu Grzybek wrote: > > Glad to hear that ! > > In a virtualized environment the most difficult fact to understand is : > « a vCPU is NOT a physical CPU ! ». Most of the time, we need to reduce the > number of vCPUs because of stuck core (kernel panic) or very slow behavior. > > That is why co-stop metric needs to be monitored next to load average. > > Mathieu > > Le 24 mai 2015 à 23:59, Pete GS <[email protected] <javascript:>> a > écrit : > > Good to know Arie! > > And thanks again to Mathieu. > > I have lots of experience with the virtualisation layer (ESX from 2.5 > onwards) but almost none with SMP workloads in a guest above about 8 > vCPU's, so I had never observed this behaviour before. Now I know a bit > more, I like to keep learning! > > I've come in after not looking at or touching Graylog over the weekend and > there is not a single instance of any of the Graylog servers being unable > to contact MongoDB so it looks like this is fixed. > > ElasticSearch cluster is green, none of the Graylog input servers have a > queue of unprocessed messages in the journal, and all are online and > working well. > > Fingers crossed I can get the new ElasticSearch hardware purchase through > which will solve my remaining problems and let the platform be 100% stable. > > Cheers, Pete > > On Saturday, 23 May 2015 07:49:24 UTC+10, Arie wrote: >> >> Same problem here with to many cpu's (not on grayling application). >> >> What happens is that code swaps continuous between cores, in our case it >> helps to bind the >> application to a core, but managing it is a dork to do. The virtual layer >> looses a lot of resources >> in constantly managing resources over the cores. With 2 it can already be >> up to 10%! >> >> We are running a lot of real-time applications and customer wants >> everything in the cloud. In our >> experience 'cloud' delivers us the most of all our problems/glitches. >> Love having some older iron >> to run graylog and elastic onto it. >> >> >> >> Op vrijdag 22 mei 2015 06:08:44 UTC+2 schreef Pete GS: >>> >>> Ok, here's where I'm at with this... >>> >>> I tried implementing the kernel options on one of the Graylog servers as >>> a test but it made no appreciable difference. In fact shortly after the >>> first reboot the VM froze with a locked CPU error. It hasn't done that >>> since a subsequent reboot though. We're not running the PVSCSI adapter >>> either. >>> >>> After observing this, I revisited Mathieu's comment regarding too many >>> CPU's. >>> >>> While I still see no contention issues for CPU resources, I started >>> wondering if there was some SMP related issue with CentOS where the extra >>> vCPU's just weren't providing enough extra to cater for the workload. >>> >>> I scaled all the nodes back to 8 vCPU's and added another four Graylog >>> servers, so I now have 8 servers receiving the inputs. >>> >>> So far this is running a lot better than the four servers with 16 and 20 >>> vCPU's. They still peak at 100% but this is not sustained, even after >>> having an ElasticSearch issue (filling the disks again) that caused a >>> backlog in the message journal overnight. >>> >>> Almost all the message backlog in the journals have been processed again >>> and it's still working well so far, this is after 24 hours or so. >>> >>> I'll see how it runs over the weekend. >>> >>> Incidentally it seems I have inadvertently stumbled across a good number >>> for the process buffer processors... it seems to work well at 2 less than >>> the number of CPU's available to the server. Running with a buffer number >>> of 6 with 8 vCPU's seems to work well. Of course I'm not sure if this is >>> just in my particular environment or if it's a general thing. >>> >>> Cheers, Pete >>> >>> On Thursday, 14 May 2015 19:13:24 UTC+10, Pete GS wrote: >>>> >>>> Thanks very much Arie, I will check these tomorrow and report back. >>>> >>>> One thing I can confirm is the heap size is configured correctly. >>>> >>>> Cheers, Pete >>>> >>>> On 14 May 2015, at 05:35, Arie <[email protected]> wrote: >>>> >>>> Lets try some more options. >>>> >>>> I see you are running your stuf virtual. Then you can consider the >>>> following for centos6 >>>> >>>> In your startup kernel config you can add the following options >>>> (/etc/grub.conf) >>>> >>>> nohz=off (for high cpu intensive systems) >>>> elevator=noop (disc scheduling is done by the virtual layer, so >>>> disable that) >>>> cgroup_disable=memory (possibly not used, it fees up some memory and >>>> allocation) >>>> >>>> if you use the pvscsi device, add the following: >>>> vmw_pvscsi.cmd_per_lun=254 >>>> vmw_pvscsi.ring_pages=32 >>>> >>>> Check disk buffers on the virtual layer too. vmware kb 2053145 >>>> see >>>> http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2053145&sliceId=1&docTypeID=DT_KB_1_1&dialogID=621755330&stateId=1%200%20593866502 >>>> >>>> Optimize your disk for performance (up to 30%!!! yes): >>>> >>>> for the filesystems were graylog and or elastic is located add the >>>> following to /etc/fstab >>>> >>>> example: >>>> /dev/mapper/vg_nagios-lv_root / ext4 >>>> defaults,noatime,nobarrier,data=writeback 1 1 >>>> and if you want to be more safe: >>>> /dev/mapper/vg_nagios-lv_root / ext4 defaults,noatime,nobarrier 1 1 >>>> >>>> >>>> is ES_HEAP_SIZE configured @ the correct place (I did that wrong at >>>> first) >>>> it is in /etc/systconfig/elasticsearch >>>> >>>> >>>> All these options together can improve system performance huge >>>> specially when they are virtial. >>>> >>>> ps did you correctly cha >>>> >>>> ... >>> >>> > -- > You received this message because you are subscribed to the Google Groups > "graylog2" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
