make sense
2013/9/3 Marcus Sorensen <shadow...@gmail.com> > I'm trying to figure out if/how management and agent restarts are > gracefully handled for long running jobs. My initial testing shows > that maybe they aren't. For example, if I try to migrate a storage > volume, and then restart the management server, I end up with two > volumes (source and destination) stuck in migrating state, with the VM > unable to start and the job stating: > > { > "accountid": "505add16-12d8-11e3-8495-5254004eff4f", > "cmd": > "org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd", > "created": "2013-09-03T11:41:55-0600", > "jobid": "698cc7cf-4ecc-40da-9bcf-261a7921ab95", > "jobprocstatus": 0, > "jobresult": { > "errorcode": 530, > "errortext": "job cancelled because of management > server restart" > }, > "jobresultcode": 530, > "jobresulttype": "object", > "jobstatus": 2, > "userid": "505bd5d6-12d8-11e3-8495-5254004eff4f" > } > > If all jobs react this way, it doesn't seem like a small bug, but > perhaps a design issue. If a job is cancelled, the state should be > rolled back, I think. Perhaps every job should have a cleanup method > that is called when the job is considered cancelled (assuming the > cancellation occurs prior to shutdown, but then that doesn't handle > crashes). > > The end result is that everyone using cloudstack should be terrified > of restarting their mgmt server, I think, especially as their > environment grows and has many things going on. Anything that goes > through a state machine could get stuck. >