[
https://issues.apache.org/jira/browse/UIMA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lou DeGenaro updated UIMA-2641:
-------------------------------
Description:
Recently, a job was submitted but got stuck in state "Completing". All work
items had completed. During the run a Job Process (JP) was launched and got
stuck in Initializing state because the machine on which it (once) existed
crashed and (therefore) the DUCC Agent responsible was unable to report or take
action. This stuck JP was keeping the job from advancing to the "Completed"
state.
The user issued the DUCC cancel command using flag --dpid to cancel the bogus
JP and the job completed normally.
This situation could be detected by the orchestrator (OR) and handled w/o human
(user) intervention.
Additionally, the WS could be helpful in a) identifying those cases that cannot
be automatically handled and b) offering guidance towards freeing up work items
held in limbo.
was:
Recently, a job was submitted but got stuck in state "Completing". All work
items had completed. During the run a Job Process (JP) was launched and got
stuck in Initializing state because the machine on which it (once) existed
crashed and (therefore) the DUCC Agent responsible was unable to report or take
action. This stuck JP was keeping the job from advancing to the "Completed"
state.
The user issued the DUCC cancel command using flag --dpid to cancel the bogus
JP and the job completed normally.
This situation could be detected by the orchestrator (OR) and handled w/o human
(user) intervention.
> DUCC orchestrator (OR) should mark unused, stubbornly alive Job Processes
> (JPs) as "Stopped" when all work items are accounted for...
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: UIMA-2641
> URL: https://issues.apache.org/jira/browse/UIMA-2641
> Project: UIMA
> Issue Type: Improvement
> Reporter: Lou DeGenaro
> Assignee: Lou DeGenaro
> Priority: Minor
>
> Recently, a job was submitted but got stuck in state "Completing". All work
> items had completed. During the run a Job Process (JP) was launched and got
> stuck in Initializing state because the machine on which it (once) existed
> crashed and (therefore) the DUCC Agent responsible was unable to report or
> take action. This stuck JP was keeping the job from advancing to the
> "Completed" state.
> The user issued the DUCC cancel command using flag --dpid to cancel the bogus
> JP and the job completed normally.
> This situation could be detected by the orchestrator (OR) and handled w/o
> human (user) intervention.
> Additionally, the WS could be helpful in a) identifying those cases that
> cannot be automatically handled and b) offering guidance towards freeing up
> work items held in limbo.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira