Dear Suresh and Andrei
Thanks for your help.
I have upgrade CloudStack from 4.9.3 to 4.11.2 but the problem still
persists.
Then I inspect database tables and I found that these three tables could be
the root cause:
- op_ha_work
- op_lock
- vm_work_job
So I delete all records in those tables and problem solved.
The content of those tables are submitted as a comment in the bug report in
jira:
https://issues.apache.org/jira/browse/CLOUDSTACK-10401
Suresh, could you tell me more about the role of those tables in CS?
I think CS had been more sensitive about concurrent jobs. Previous versions
works better.
Regards

On Wed, Jan 23, 2019 at 9:43 PM Suresh Kumar Anaparti <
sureshkumar.anapa...@gmail.com> wrote:

> Hi Alireza,
>
> *sync_queue *table is the actual VM sync queue which holds a queue id for
> each VM (*sync_objtype*: VmWorkJobQueue, *sync_objid*: <VM-Id>) and the VM
> jobs would reside in *sync_queue_item* table against that queue id. Only
> one running job is allowed per VM queue (*queue_size_limit*: 1 in
> *sync_queue* table). The active/running job would have the *queue_proc_id*,
> *queue_proc_number* and *queue_proc_time* set in the *sync_queue_item*
> table
> and the rest jobs with that queue id would be waiting for active job to
> complete. So, to delete pending jobs, records in the *sync_queue_item
> *table
> has to be cleared for the respective VMs, not the *sync_queue *table.
>
> I think, in your case, snapshots is taking long time and other jobs in that
> VM are pending for long time as they are in queue waiting for snapshot job
> to complete. What are the config values set for
> "job.cancel.threshold.minutes", "job.expire.minutes" and
> "volume.snapshot.job.cancel.threshold"? Are the jobs cancelled after the
> threshold time?
>
> Thanks,
> Suresh
>
> On Wed, Jan 23, 2019 at 7:14 PM Andrei Mikhailovsky
> <and...@arhont.com.invalid> wrote:
>
> > Hi
> >
> > I've had this issue a few times in 2018 and managed to get it fixed
> pretty
> > easily, although had spent a number of hours initially trying to figure
> out
> > WTF is going on. This issue looks like one of those artefacts that
> creeped
> > up in one of the versions released in 2018 and hasn't been addressed by
> the
> > dev team.
> >
> > The way I fixed it was similar to what has been recommended earlier.
> > However, the difference was that I am sure I've looked at more tables
> than
> > just the two suggested. Basically, I've stopped the management server,
> > created the sql backup, connected to the sql db and listed all tables.
> > Grepped for the words like job/schedule/queue/sync. After that I've went
> > through all the tables and pretty much removed all the past / active /
> > awaiting execution jobs. I have started by looking at the vm related jobs
> > (the vm that I've tried to start but wasn't able to). This has worked
> once,
> > but the second time I had to remove a lot more jobs which relate to other
> > vms. After that I've started the management server and all went well from
> > there.
> >
> > What I have also noticed is that my snapshot jobs (I use KVM and Ceph)
> > seem to be blocking jobs on the hypervisor hosts which are running these
> > snapshots. So, if I am trying to perform various vm related jobs on a
> host
> > server which is currently running a snapshot process, that job will not
> be
> > executed until the snapshot process is done. I've tested this countless
> > number of times and it's still the case. Again, this issued appeared in
> one
> > of the 2018 releases as I've never seen between 2012 - 2017.
> >
> > Both issues are annoying as hell!
> >
> > Cheers
> >
> > ----- Original Message -----
> > > From: "Alireza Eskandari" <astro.alir...@gmail.com>
> > > To: "dev" <dev@cloudstack.apache.org>
> > > Sent: Wednesday, 23 January, 2019 12:40:48
> > > Subject: Re: Help! Jobs stuck in pending state
> >
> > > I'm following this issue in github:
> > > https://github.com/apache/cloudstack/issues/3104
> > > Please leave your comments
> > > Thanks
> > >
> > > On Wed, Jan 23, 2019 at 12:39 PM Wei ZHOU <ustcweiz...@gmail.com>
> wrote:
> > >
> > >> Hi Alireza,
> > >>
> > >> could you try again after restarting mgt server ?
> > >>
> > >> -Wei
> > >>
> > >> Alireza Eskandari <astro.alir...@gmail.com> 于2019年1月23日周三 上午6:22写道:
> > >>
> > >> > First I deleted two jobs which was existed in  vm_work_job table and
> > its
> > >> > related entry in  sync_queue table but it doesn't help.
> > >> > Then I delete all the entries in sync_queue tables and again no
> > success.
> > >> > Any idea?
> > >> >
> > >> > On Wed, Jan 23, 2019 at 1:50 AM Wei ZHOU <ustcweiz...@gmail.com>
> > wrote:
> > >> >
> > >> > > If you know the instance id and mysql password, it should work
> after
> > >> > > removing some records in mysql.
> > >> > >
> > >> > > ```
> > >> > > set @id=XXXXX;
> > >> > >
> > >> > > delete from vm_work_job where vm_instance_id=@id;
> > >> > > delete from sync_queue where sync_objid=@id;
> > >> > > ```
> > >> > >
> > >> > > Alireza Eskandari <astro.alir...@gmail.com> 于2019年1月22日周二
> > 下午10:59写道:
> > >> > >
> > >> > > > Hi guys
> > >> > > > I have opened a bug in jira about my problem in CS:
> > >> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-10401
> > >> > > > CloudStack doesn't process jobs! My cloud in totally unusable.
> > >> > > > Thanks in advance for you help.
> > >> > > >
> > >> > >
> > >> >
> >
>

Reply via email to