Good to know Arie!

And thanks again to Mathieu.

I have lots of experience with the virtualisation layer (ESX from 2.5 
onwards) but almost none with SMP workloads in a guest above about 8 
vCPU's, so I had never observed this behaviour before. Now I know a bit 
more, I like to keep learning!

I've come in after not looking at or touching Graylog over the weekend and 
there is not a single instance of any of the Graylog servers being unable 
to contact MongoDB so it looks like this is fixed.

ElasticSearch cluster is green, none of the Graylog input servers have a 
queue of unprocessed messages in the journal, and all are online and 
working well.

Fingers crossed I can get the new ElasticSearch hardware purchase through 
which will solve my remaining problems and let the platform be 100% stable.

Cheers, Pete

On Saturday, 23 May 2015 07:49:24 UTC+10, Arie wrote:
>
> Same problem here with to many cpu's (not on grayling application).
>
> What happens is that code swaps continuous between cores, in our case it 
> helps to bind the
> application to a core, but managing it is a dork to do. The virtual layer 
> looses a lot of resources
> in constantly managing resources over the cores. With 2 it can already be 
> up to 10%!
>
> We are running a lot of real-time applications and customer wants 
> everything in the cloud. In our
> experience 'cloud' delivers us the most of all our problems/glitches. Love 
> having some older iron
> to run graylog and elastic onto it.
>
>
>
> Op vrijdag 22 mei 2015 06:08:44 UTC+2 schreef Pete GS:
>>
>> Ok, here's where I'm at with this...
>>
>> I tried implementing the kernel options on one of the Graylog servers as 
>> a test but it made no appreciable difference. In fact shortly after the 
>> first reboot the VM froze with a locked CPU error. It hasn't done that 
>> since a subsequent reboot though. We're not running the PVSCSI adapter 
>> either.
>>
>> After observing this, I revisited Mathieu's comment regarding too many 
>> CPU's.
>>
>> While I still see no contention issues for CPU resources, I started 
>> wondering if there was some SMP related issue with CentOS where the extra 
>> vCPU's just weren't providing enough extra to cater for the workload.
>>
>> I scaled all the nodes back to 8 vCPU's and added another four Graylog 
>> servers, so I now have 8 servers receiving the inputs.
>>
>> So far this is running a lot better than the four servers with 16 and 20 
>> vCPU's. They still peak at 100% but this is not sustained, even after 
>> having an ElasticSearch issue (filling the disks again) that caused a 
>> backlog in the message journal overnight.
>>
>> Almost all the message backlog in the journals have been processed again 
>> and it's still working well so far, this is after 24 hours or so.
>>
>> I'll see how it runs over the weekend.
>>
>> Incidentally it seems I have inadvertently stumbled across a good number 
>> for the process buffer processors... it seems to work well at 2 less than 
>> the number of CPU's available to the server. Running with a buffer number 
>> of 6 with 8 vCPU's seems to work well. Of course I'm not sure if this is 
>> just in my particular environment or if it's a general thing.
>>
>> Cheers, Pete
>>
>> On Thursday, 14 May 2015 19:13:24 UTC+10, Pete GS wrote:
>>>
>>> Thanks very much Arie, I will check these tomorrow and report back.
>>>
>>> One thing I can confirm is the heap size is configured correctly.
>>>
>>> Cheers, Pete
>>>
>>> On 14 May 2015, at 05:35, Arie <[email protected]> wrote:
>>>
>>> Lets try some more options.
>>>
>>> I see you are running your stuf virtual. Then you can consider the 
>>> following for centos6
>>>
>>> In your startup kernel config you can add the following options 
>>> (/etc/grub.conf)
>>>
>>>   nohz=off (for high cpu intensive systems)
>>>   elevator=noop (disc scheduling is done by the virtual layer, so 
>>> disable that)
>>>   cgroup_disable=memory (possibly not used, it fees up some memory and 
>>> allocation)
>>>   
>>> if you use the pvscsi device, add the following:
>>>   vmw_pvscsi.cmd_per_lun=254 
>>>  vmw_pvscsi.ring_pages=32
>>>
>>>  Check disk buffers on the virtual layer too. vmware kb 2053145
>>>   see 
>>> http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2053145&sliceId=1&docTypeID=DT_KB_1_1&dialogID=621755330&stateId=1%200%20593866502
>>>
>>>  Optimize your disk for performance (up to 30%!!! yes):
>>>
>>>  for the filesystems were graylog and or elastic is located add the 
>>> following to /etc/fstab
>>>
>>> example:
>>> /dev/mapper/vg_nagios-lv_root /  ext4 
>>> defaults,noatime,nobarrier,data=writeback 1 1
>>> and if you want to be more safe:
>>> /dev/mapper/vg_nagios-lv_root /  ext4 defaults,noatime,nobarrier 1 1    
>>>
>>> is ES_HEAP_SIZE configured @ the correct place (I did that wrong at 
>>> first)
>>> it is in /etc/systconfig/elasticsearch
>>>
>>>
>>> All these options together can improve system performance huge specially 
>>> when they are virtial.
>>>
>>> ps did you correctly cha
>>>
>>> ...
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to