Hi, In the past a couple of clients of our have had issues with indirect agents (KVM hosts and system VMs) connecting over port 8250, particularly if connectivity was lost to the management server(s). They both had 300+ indirect agents active.
In these circumstances we have found that running a netstat to see connections to port 8250 on the mgmt server(s) revealed many open but unused connections to port 8250. I recall at one time we found the agent connection code had been altered to attempt to reconnect it the connection didn't complete with 10secs. However the failed connection would take 60 seconds to time out. Another time we found that management server and mysql db were both being starved of enough connections to the mysql db to process the reconnections faster enough. The default from the mgmt server is 100 connections and the documented setting for mysql is 350 connections. However external connections (and additional mgmt servers) require these to be adjusted. -- just some ideas... Regards, Paul Angus VP Technology/Cloud Architect S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus paul.an...@shapeblue.com -----Original Message----- From: ilya [mailto:ilya.mailing.li...@gmail.com] Sent: 19 November 2015 20:32 To: dev@cloudstack.apache.org Subject: Re: [STABILITY]} Large KVM Infrastructure with ACS Rafael, Please see response in-line: On 11/18/15 4:16 PM, Rafael Weingärtner wrote: > When you say 250+, you mean 250+ host spread in lots of cluster, right? > If I am not mistaken, ACS limits the number of KVM hosts in a cluster, > something like 50? I do not remember now if that value can be > configured, may it can be. Yes lots of clusters, way less than 50 per cluster. > I recall to have read something in a Red Hat doc about the KVM that it > does not have limit of hosts in a cluster. Actually, it does not seem > to have the figure of cluster at all. That is created solely in ACS, > to facilitate the management. > > To debug the problem, I would start with the following questions: > > Is every single cluster of your environment is presenting that problem? No, few clusters with some nodes within the cluster - not all. > What is the size of physical hosts that you have in your environment? > Do all of them have the same configuration? Yes, all hosts have the same configuration. Cant go into details, but its rather large. > Do you know the load (resource allocated and used) that is being > imposed in those hosts that had shown those problems? > What is your over commitment/provisioning factor that you are using? Servers are not heavily taxed, we dont over commit memory, other components could be over committed by 2 or less. Overall, we still have capacity to accommodate more VMs if needed, we just don't max it out. ---- Both Marcus and myself are looking through this, it could be just our specific implementation - hence, I wanted to see if anyone else in the community with heavy KVM usage came across this issue. Maybe I need to ping LeaseWeb and ExtremePC folks.. Thanks, ilya > > On Wed, Nov 18, 2015 at 8:19 PM, Daan Hoogland > <daan.hoogl...@gmail.com> > wrote: > >> sounds like a bad limit Ilya, i'll keep an eye out. >> >> On Wed, Nov 18, 2015 at 10:10 PM, ilya <ilya.mailing.li...@gmail.com> >> wrote: >> >>> I'm curious if anyone runs ACS with atleast 250+ KVM hosts. >>> >>> We've been noticing weird issues with KVM where occasionally lots of >>> KVM agents get Nio connection closed issue followed by barrage of alerts. >>> >>> In some instances the agent reconnects right away and in other - it >>> attempts to reconnect but never receives an ACK from MS. >>> >>> Please let me know if you notice anything like it and if you found a >>> solution. >>> >>> Also, it would help to know what global settings have been tuned to >>> make things work better (aside from direct.agent.*) and how MS are running. >>> >>> Thanks >>> ilya >>> >> >> >> >> -- >> Daan >> > > > Find out more about ShapeBlue and our range of CloudStack related services IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/> CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/> This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.