Jessica Tomechak created CLOUDSTACK-1919:
--------------------------------------------
Summary: Runbooks
Key: CLOUDSTACK-1919
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-1919
Project: CloudStack
Issue Type: Improvement
Security Level: Public (Anyone can view this level - this is the default.)
Components: Doc
Reporter: Jessica Tomechak
The RS [RightScale] runbooks are good stuff - we should seriously consider
producing CS-specific content like that. (Kevin Kluge)
On 03/02/2012 01:27 PM, Chiradeep Vittal wrote:
> Those are useful /tools. /I am more interested in the /content. /
> Running a cloud is a combination of operating cloudstack (stop / start
> / add host / delete host/ devices / storage) + operating the storage +
> operating the network + operating the hypervisor + ancilliary items
> like the SQL server database. It is clear for instance that the
> requisite checks were not done at [customer] before adding hosts to the
> cluster (check CPU level, driver patch levels, firmware upgrades), nor
> were they monitoring cloudstack for warnings about filling up storage
> or monitoring the XS hotfix mailblast. The Run Book for the cloud will
> contain content like this and solutions for when they receive alerts
> about storage filling up or how to recover corrupt vhds, how to
> periodically back up primary storage, etc. How to transfer VMs between
> failure domains when a particular failure domain has failed.
> References to other runbooks such as How to backup and restore MySQL.
> Monitor CS server and the underlying hardware with Nagios etc. Host
> maintenance procedures, storage maintenance procedures.
>
> In addition, there needs to be a reference architecture for deploying
> a cloud (define your failure domains, plan for capacity based on
> service offerings, calculate IOPs requirements, network bandwidth,
> switch capacity, core router capacity, ip address planning – public and
> private).
>
> Finally, [before a user sets up a cloud, they should evaluate whether they
> are ready]. Do they have change
> management procedures? Do they have a CMDB? Do they have document
> problem management procedures? Let me just throw ITIL in there.
>
I agree - honestly the complexity of what IaaS is incredible.
I'd go a step further in [the user] evaluation and say:
* Do they have config management
* Do they have automated provisioning (esp for hypervisors) - can they get a
new hypervisor up in EXACTLY the same configuration as the last one, down to
network bonding without any manual intervention?
* Do they have a monitoring system in place - and is it or can it be made
capable of monitoring cloudstack. [Need to define the baseline of what the user
should be monitoring]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira