make sense

2013/9/3 Marcus Sorensen <shadow...@gmail.com>

> I'm trying to figure out if/how management and agent restarts are
> gracefully handled for long running jobs. My initial testing shows
> that maybe they aren't. For example, if I try to migrate a storage
> volume, and then restart the management server, I end up with two
> volumes (source and destination) stuck in migrating state, with the VM
> unable to start and the job stating:
>
>             {
>                 "accountid": "505add16-12d8-11e3-8495-5254004eff4f",
>                 "cmd":
> "org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd",
>                 "created": "2013-09-03T11:41:55-0600",
>                 "jobid": "698cc7cf-4ecc-40da-9bcf-261a7921ab95",
>                 "jobprocstatus": 0,
>                 "jobresult": {
>                     "errorcode": 530,
>                     "errortext": "job cancelled because of management
> server restart"
>                 },
>                 "jobresultcode": 530,
>                 "jobresulttype": "object",
>                 "jobstatus": 2,
>                 "userid": "505bd5d6-12d8-11e3-8495-5254004eff4f"
>             }
>
> If all jobs react this way, it doesn't seem like a small bug, but
> perhaps a design issue. If a job is cancelled, the state should be
> rolled back, I think. Perhaps every job should have a cleanup method
> that is called when the job is considered cancelled (assuming the
> cancellation occurs prior to shutdown, but then that doesn't handle
> crashes).
>
> The end result is that everyone using cloudstack should be terrified
> of restarting their mgmt server, I think, especially as their
> environment grows and has many things going on. Anything that  goes
> through a state machine could get stuck.
>

Reply via email to