I run into this problem often as well and I've had the exact same result to the proposed solution, setting the status to 0 for stuck jobs causes them to fail. To make matters worse they fail without creating a failed zip file to reingest.
Is there a way to resume stuck jobs? [email protected] wrote on 10/04/2011 08:38:56 AM: > > Hi Tobias, > > I tried your procedure on our server since 2 jobs were "stuck" and not > moving on. > > This is what I got: > 2011-10-04 15:32:27 WARN (WorkflowServiceImpl:1426) - Exception while > accepting job Job {id:4653, version:31} > java.lang.IllegalStateException: Cannot start a workflow in state > 'RUNNING' > at > org.opencastproject.workflow.impl.WorkflowServiceImpl.runWorkflow > (WorkflowServiceImpl.java:648) > at > org.opencastproject.workflow.impl.WorkflowServiceImpl.process > (WorkflowServiceImpl.java:1389) > at > org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call > (WorkflowServiceImpl.java:1717) > at > org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call > (WorkflowServiceImpl.java:1690) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask > (ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > And now I have 2 failed recordings in on the "Recordings"-screen. > > What do you recommend to get these jobs active again and continue > their processing where they ended last? > > One of these was a 3 hours lecture, matterhorn worked like 12 > hourse on it, and it was almost finished with > processing (was extracting text segments when it began to hung); so > any input is welcome how to get Matterhorn to finish where it stoped. > > Thanks, Andreas > > Tobias Wunden schrieb am Tue, 30 Aug 2011 betreff "Re: [Opencast] > Resuming...": > > Date: Tue, 30 Aug 2011 16:14:37 +0200 > > From: Tobias Wunden <[email protected]> > > Reply-To: Opencast Community <[email protected]> > > To: Opencast Community <[email protected]> > > Subject: Re: [Opencast] Resuming video processing after a Reboot > > > > Hi Nathan, > > > > there is one service in Matterhorn (service registry) which is > keeping track of the workflows that are being executed. That service > does not only know which state a workflow is in, it also knows on > which host it is running. So what you need to do is tell the service > registry to restart all the jobs that are currently marked as > "running" on the affected machines. > > > > Unfortunately, there was not time so far to add this to the ui, so > you will need to do this manually by updating the workflow's running > status in that database. > > > > 1) You can find the affected workflows by issuing > > > > SELECT j.id > > FROM job j, service_registration s, host_registration h > > WHERE host = 'http://x.y.z'; > > AND j.status = 2 > > AND j.operation = 'START_WORKFLOW' > > AND j.processor_svc = s.id > > AND s.host_reg = h.id > > > > which basically translates to "find me every job that started a > workflow which is still marked as running on host x.y.z. > > > > 2) After that it should be as easy as making sure that job is > restarted by setting the status to "qeueued": > > > > UPDATE job > > SET status = 0 > > FROM job j, service_registration s, host_registration h > > WHERE host = 'http://x.y.z'; > > AND j.status = 2 > > AND j.operation = 'START_WORKFLOW' > > AND j.processor_svc = s.id > > AND s.host_reg = h.id > > > > Tobias > > > > On 30.08.2011, at 12:59, Nathan Cameron wrote: > > > >> Hello all, > >> Yesterday the core computer in our system that handles video > processing and distribution got overloaded and the matterhorn > service stopped altogether. I knew of no alternative but to restart > the service. It was processing several recordings when this > happened. Upon restarting the web UI many of the recordings > initially showed they were in the same place they were before, and a > few failed completely. It's been approximately 7 hours since I did > the restart, and none of the recordings' states have changed. > >> > >> My question then: Is there some way to force the core to resume > processing on half processed files? > >> > >> I'm also wondering if there is a way to take one of the raw > capture folders on a given capture agent and upload it to the media > module. For example, the recordings that failed have full audio and > video. I know they would have successfully processed apart from the > system error. How do I take one of those folders from the capture > agent (like a 2677) and get it to retry? Is there a specific file > from the folder I must upload? > >> > >> Any help here is appreciated. Until I can upgrade some more of > our hardware I'm going to be running into these issues. > >> > >> Thank You, > >> Nathan
_______________________________________________ Community mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/community To unsubscribe please email [email protected] _______________________________________________
