Re: [graylog2] High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Mathieu Grzybek Mon, 25 May 2015 12:00:00 -0700

Glad to hear that !

In a virtualized environment the most difficult fact to understand is : « a 
vCPU is NOT a physical CPU ! ». Most of the time, we need to reduce the number 
of vCPUs because of stuck core (kernel panic) or very slow behavior.


That is why co-stop metric needs to be monitored next to load average.

Mathieu

> Le 24 mai 2015 à 23:59, Pete GS <[email protected]> a écrit :
> 
> Good to know Arie!
> 
> And thanks again to Mathieu.
> 
> I have lots of experience with the virtualisation layer (ESX from 2.5 
> onwards) but almost none with SMP workloads in a guest above about 8 vCPU's, 
> so I had never observed this behaviour before. Now I know a bit more, I like 
> to keep learning!
> 
> I've come in after not looking at or touching Graylog over the weekend and 
> there is not a single instance of any of the Graylog servers being unable to 
> contact MongoDB so it looks like this is fixed.
> 
> ElasticSearch cluster is green, none of the Graylog input servers have a 
> queue of unprocessed messages in the journal, and all are online and working 
> well.
> 
> Fingers crossed I can get the new ElasticSearch hardware purchase through 
> which will solve my remaining problems and let the platform be 100% stable.
> 
> Cheers, Pete
> 
> On Saturday, 23 May 2015 07:49:24 UTC+10, Arie wrote:
> Same problem here with to many cpu's (not on grayling application).
> 
> What happens is that code swaps continuous between cores, in our case it 
> helps to bind the
> application to a core, but managing it is a dork to do. The virtual layer 
> looses a lot of resources
> in constantly managing resources over the cores. With 2 it can already be up 
> to 10%!
> 
> We are running a lot of real-time applications and customer wants everything 
> in the cloud. In our
> experience 'cloud' delivers us the most of all our problems/glitches. Love 
> having some older iron
> to run graylog and elastic onto it.
> 
> 
> 
> Op vrijdag 22 mei 2015 06:08:44 UTC+2 schreef Pete GS:
> Ok, here's where I'm at with this...
> 
> I tried implementing the kernel options on one of the Graylog servers as a 
> test but it made no appreciable difference. In fact shortly after the first 
> reboot the VM froze with a locked CPU error. It hasn't done that since a 
> subsequent reboot though. We're not running the PVSCSI adapter either.
> 
> After observing this, I revisited Mathieu's comment regarding too many CPU's.
> 
> While I still see no contention issues for CPU resources, I started wondering 
> if there was some SMP related issue with CentOS where the extra vCPU's just 
> weren't providing enough extra to cater for the workload.
> 
> I scaled all the nodes back to 8 vCPU's and added another four Graylog 
> servers, so I now have 8 servers receiving the inputs.
> 
> So far this is running a lot better than the four servers with 16 and 20 
> vCPU's. They still peak at 100% but this is not sustained, even after having 
> an ElasticSearch issue (filling the disks again) that caused a backlog in the 
> message journal overnight.
> 
> Almost all the message backlog in the journals have been processed again and 
> it's still working well so far, this is after 24 hours or so.
> 
> I'll see how it runs over the weekend.
> 
> Incidentally it seems I have inadvertently stumbled across a good number for 
> the process buffer processors... it seems to work well at 2 less than the 
> number of CPU's available to the server. Running with a buffer number of 6 
> with 8 vCPU's seems to work well. Of course I'm not sure if this is just in 
> my particular environment or if it's a general thing.
> 
> Cheers, Pete
> 
> On Thursday, 14 May 2015 19:13:24 UTC+10, Pete GS wrote:
> Thanks very much Arie, I will check these tomorrow and report back.
> 
> One thing I can confirm is the heap size is configured correctly.
> 
> Cheers, Pete
> 
> On 14 May 2015, at 05:35, Arie <[email protected] <>> wrote:
> 
> Lets try some more options.
> 
> I see you are running your stuf virtual. Then you can consider the following 
> for centos6
> 
> In your startup kernel config you can add the following options 
> (/etc/grub.conf)
> 
>   nohz=off (for high cpu intensive systems)
>   elevator=noop (disc scheduling is done by the virtual layer, so disable 
> that)
>   cgroup_disable=memory (possibly not used, it fees up some memory and 
> allocation)
>   
> if you use the pvscsi device, add the following:
>   vmw_pvscsi.cmd_per_lun=254 
>  vmw_pvscsi.ring_pages=32
> 
>  Check disk buffers on the virtual layer too. vmware kb 2053145
>   see 
> http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2053145&sliceId=1&docTypeID=DT_KB_1_1&dialogID=621755330&stateId=1%200%20593866502
>  
> <http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2053145&sliceId=1&docTypeID=DT_KB_1_1&dialogID=621755330&stateId=1%200%20593866502>
> 
>  Optimize your disk for performance (up to 30%!!! yes):
> 
>  for the filesystems were graylog and or elastic is located add the following 
> to /etc/fstab
> 
> example:
> /dev/mapper/vg_nagios-lv_root /  ext4 
> defaults,noatime,nobarrier,data=writeback 1 1
> and if you want to be more safe:
> /dev/mapper/vg_nagios-lv_root /  ext4 defaults,noatime,nobarrier 1 1    
> 
> is ES_HEAP_SIZE configured @ the correct place (I did that wrong at first)
> it is in /etc/systconfig/elasticsearch
> 
> 
> All these options together can improve system performance huge specially when 
> they are virtial.
> 
> ps did you correctly cha
> ...
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "graylog2" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Reply via email to