-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15080/
-----------------------------------------------------------
Review request for cloudstack, Alex Huang, Chiradeep Vittal, and Darren
Shepherd.
Bugs: CLOUDSTACK-4855
https://issues.apache.org/jira/browse/CLOUDSTACK-4855
Repository: cloudstack-git
Description
-------
Cloudstack sends requests to directly managed HV hosts (direct agents) using
the direct agent thread pool. The size of the pool is determined by global
config direct.agent.pool.size defaulted to 500.
Currently there is no restriction on the number of threads a direct agent can
use from this shared thread pool to send requests to the host. This is fine as
long as the host is responding to requests
in a reasonable amount of time. But if there is a considerable delay in getting
response, the thread remain blocked for that much time. As more commands are
send to the slow host threads keep getting
blocked. This can eventually lead to a situation where requests to healthy
hosts cannot be processed as there are not enough free threads.
The problem being addressed here is to localize the impact of few bad hosts, so
that entire management server is not affected.
One such way is to throttle based on the # of outstanding requests on per host
basis. The outstanding requests to a host will be a % of direct agent pool
size. This is configurable based on
direct.agent.thread.cap. This will ensure that the impacted host will be bound
by a upper cap on the number of threads it can use to process requests and not
the entire pool.
Note: The reason for checking the outstanding request count in the Task.run()
method is to take into account cron jobs that gets scheduled at agent startup.
Diffs
-----
engine/orchestration/src/com/cloud/agent/manager/AgentAttache.java ff35255
engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
3e684cc
engine/orchestration/src/com/cloud/agent/manager/DirectAgentAttache.java
7d3f765
Diff: https://reviews.apache.org/r/15080/diff/
Testing
-------
Verified by tweaking the per agent upper cap to a value of 1 and checked that
the requests are getting scheduled but the executor thread simply bails out.
Thanks,
Koushik Das