I run into this problem often as well and I've had the exact same result 
to the proposed solution, setting the status to 0 for stuck jobs causes 
them to fail.  To make matters worse they fail without creating a failed 
zip file to reingest.

Is there a way to resume stuck jobs?

[email protected] wrote on 10/04/2011 08:38:56 AM:

> 
> Hi Tobias,
> 
> I tried your procedure on our server since 2 jobs were "stuck" and not 
> moving on.
> 
> This is what I got:
> 2011-10-04 15:32:27  WARN (WorkflowServiceImpl:1426) - Exception while 
> accepting job Job {id:4653, version:31}
> java.lang.IllegalStateException: Cannot start a workflow in state 
> 'RUNNING'
>          at 
> org.opencastproject.workflow.impl.WorkflowServiceImpl.runWorkflow
> (WorkflowServiceImpl.java:648)
>          at 
> org.opencastproject.workflow.impl.WorkflowServiceImpl.process
> (WorkflowServiceImpl.java:1389)
>          at 
> org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call
> (WorkflowServiceImpl.java:1717)
>          at 
> org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call
> (WorkflowServiceImpl.java:1690)
>          at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>          at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask
> (ThreadPoolExecutor.java:886)
>          at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:619)
> 
> And now I have 2 failed recordings in on the "Recordings"-screen.
> 
> What do you recommend to get these jobs active again and continue 
> their processing where they ended last?
> 
> One of these was a 3 hours lecture, matterhorn worked like 12 
> hourse on it, and it was almost finished with 
> processing (was extracting text segments when it began to hung); so 
> any input is welcome how to get Matterhorn to finish where it stoped.
> 
> Thanks, Andreas
> 
> Tobias Wunden schrieb am Tue, 30 Aug 2011 betreff "Re: [Opencast] 
> Resuming...":
> > Date: Tue, 30 Aug 2011 16:14:37 +0200
> > From: Tobias Wunden <[email protected]>
> > Reply-To: Opencast Community <[email protected]>
> > To: Opencast Community <[email protected]>
> > Subject: Re: [Opencast] Resuming video processing after a Reboot
> > 
> > Hi Nathan,
> >
> > there is one service in Matterhorn (service registry) which is 
> keeping track of the workflows that are being executed. That service
> does not only know which state a workflow is in, it also knows on 
> which host it is running. So what you need to do is tell the service
> registry to restart all the jobs that are currently marked as 
> "running" on the affected machines.
> >
> > Unfortunately, there was not time so far to add this to the ui, so
> you will need to do this manually by updating the workflow's running
> status in that database.
> >
> > 1) You can find the affected workflows by issuing
> >
> > SELECT j.id
> > FROM job j, service_registration s, host_registration h
> > WHERE host = 'http://x.y.z';
> >    AND j.status = 2
> >        AND j.operation = 'START_WORKFLOW'
> >        AND j.processor_svc = s.id
> >    AND s.host_reg = h.id
> >
> > which basically translates to "find me every job that started a 
> workflow which is still marked as running on host x.y.z.
> >
> > 2) After that it should be as easy as making sure that job is 
> restarted by setting the status to "qeueued":
> >
> > UPDATE job
> > SET status = 0
> > FROM job j, service_registration s, host_registration h
> > WHERE host = 'http://x.y.z';
> >    AND j.status = 2
> >        AND j.operation = 'START_WORKFLOW'
> >        AND j.processor_svc = s.id
> >    AND s.host_reg = h.id
> >
> > Tobias
> >
> > On 30.08.2011, at 12:59, Nathan Cameron wrote:
> >
> >> Hello all,
> >>   Yesterday the core computer in our system that handles video 
> processing and distribution got overloaded and the matterhorn 
> service stopped altogether.  I knew of no alternative but to restart
> the service.  It was processing several recordings when this 
> happened.  Upon restarting the web UI many of the recordings 
> initially showed they were in the same place they were before, and a
> few failed completely.  It's been approximately 7 hours since I did 
> the restart, and none of the recordings' states have changed.
> >>
> >> My question then:  Is there some way to force the core to resume 
> processing on half processed files?
> >>
> >> I'm also wondering if there is a way to take one of the raw 
> capture folders on a given capture agent and upload it to the media 
> module.  For example, the recordings that failed have full audio and
> video.  I know they would have successfully processed apart from the
> system error.  How do I take one of those folders from the capture 
> agent (like a 2677) and get it to retry?  Is there a specific file 
> from the folder I must upload?
> >>
> >> Any help here is appreciated.  Until I can upgrade some more of 
> our hardware I'm going to be running into these issues.
> >>
> >> Thank You,
> >> Nathan
_______________________________________________
Community mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/community


To unsubscribe please email
[email protected]
_______________________________________________

Reply via email to