How many MS do you have in your environment? On Thu, Nov 19, 2015 at 7:56 PM, Paul Angus <paul.an...@shapeblue.com> wrote:
> Hi, > > In the past a couple of clients of our have had issues with indirect > agents (KVM hosts and system VMs) connecting over port 8250, particularly > if connectivity was lost to the management server(s). They both had 300+ > indirect agents active. > > In these circumstances we have found that running a netstat to see > connections to port 8250 on the mgmt server(s) revealed many open but > unused connections to port 8250. > > I recall at one time we found the agent connection code had been altered > to attempt to reconnect it the connection didn't complete with 10secs. > However the failed connection would take 60 seconds to time out. > > Another time we found that management server and mysql db were both being > starved of enough connections to the mysql db to process the reconnections > faster enough. The default from the mgmt server is 100 connections and the > documented setting for mysql is 350 connections. However external > connections (and additional mgmt servers) require these to be adjusted. > > -- just some ideas... > > > Regards, > > Paul Angus > VP Technology/Cloud Architect > S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus > paul.an...@shapeblue.com > > -----Original Message----- > From: ilya [mailto:ilya.mailing.li...@gmail.com] > Sent: 19 November 2015 20:32 > To: dev@cloudstack.apache.org > Subject: Re: [STABILITY]} Large KVM Infrastructure with ACS > > Rafael, > > Please see response in-line: > > On 11/18/15 4:16 PM, Rafael Weingärtner wrote: > > When you say 250+, you mean 250+ host spread in lots of cluster, right? > > If I am not mistaken, ACS limits the number of KVM hosts in a cluster, > > something like 50? I do not remember now if that value can be > > configured, may it can be. > > Yes lots of clusters, way less than 50 per cluster. > > > I recall to have read something in a Red Hat doc about the KVM that it > > does not have limit of hosts in a cluster. Actually, it does not seem > > to have the figure of cluster at all. That is created solely in ACS, > > to facilitate the management. > > > > To debug the problem, I would start with the following questions: > > > > Is every single cluster of your environment is presenting that problem? > > No, few clusters with some nodes within the cluster - not all. > > > What is the size of physical hosts that you have in your environment? > > Do all of them have the same configuration? > Yes, all hosts have the same configuration. Cant go into details, but its > rather large. > > > Do you know the load (resource allocated and used) that is being > > imposed in those hosts that had shown those problems? > > What is your over commitment/provisioning factor that you are using? > Servers are not heavily taxed, we dont over commit memory, other > components could be over committed by 2 or less. Overall, we still have > capacity to accommodate more VMs if needed, we just don't max it out. > > ---- > > Both Marcus and myself are looking through this, it could be just our > specific implementation - hence, I wanted to see if anyone else in the > community with heavy KVM usage came across this issue. > > Maybe I need to ping LeaseWeb and ExtremePC folks.. > > Thanks, > ilya > > > > On Wed, Nov 18, 2015 at 8:19 PM, Daan Hoogland > > <daan.hoogl...@gmail.com> > > wrote: > > > >> sounds like a bad limit Ilya, i'll keep an eye out. > >> > >> On Wed, Nov 18, 2015 at 10:10 PM, ilya <ilya.mailing.li...@gmail.com> > >> wrote: > >> > >>> I'm curious if anyone runs ACS with atleast 250+ KVM hosts. > >>> > >>> We've been noticing weird issues with KVM where occasionally lots of > >>> KVM agents get Nio connection closed issue followed by barrage of > alerts. > >>> > >>> In some instances the agent reconnects right away and in other - it > >>> attempts to reconnect but never receives an ACK from MS. > >>> > >>> Please let me know if you notice anything like it and if you found a > >>> solution. > >>> > >>> Also, it would help to know what global settings have been tuned to > >>> make things work better (aside from direct.agent.*) and how MS are > running. > >>> > >>> Thanks > >>> ilya > >>> > >> > >> > >> > >> -- > >> Daan > >> > > > > > > > Find out more about ShapeBlue and our range of CloudStack related services > > IaaS Cloud Design & Build< > http://shapeblue.com/iaas-cloud-design-and-build//> > CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/> > CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> > CloudStack Software Engineering< > http://shapeblue.com/cloudstack-software-engineering/> > CloudStack Infrastructure Support< > http://shapeblue.com/cloudstack-infrastructure-support/> > CloudStack Bootcamp Training Courses< > http://shapeblue.com/cloudstack-training/> > > This email and any attachments to it may be confidential and are intended > solely for the use of the individual to whom it is addressed. Any views or > opinions expressed are solely those of the author and do not necessarily > represent those of Shape Blue Ltd or related companies. If you are not the > intended recipient of this email, you must neither take any action based > upon its contents, nor copy or show it to anyone. Please contact the sender > if you believe you have received this email in error. Shape Blue Ltd is a > company incorporated in England & Wales. ShapeBlue Services India LLP is a > company incorporated in India and is operated under license from Shape Blue > Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil > and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is > a company registered by The Republic of South Africa and is traded under > license from Shape Blue Ltd. ShapeBlue is a registered trademark. > -- Rafael Weingärtner