Marcus Christie created AIRAVATA-2327:
-----------------------------------------
Summary: Process status messages lost by orchestrator
Key: AIRAVATA-2327
URL: https://issues.apache.org/jira/browse/AIRAVATA-2327
Project: Airavata
Issue Type: Bug
Components: Airavata Orchestrator
Affects Versions: 0.17
Reporter: Marcus Christie
Assignee: Shameera Rathnayaka
Fix For: 0.18
Zhong with the dREG gateway reported an experiment where the status was "stuck"
in EXECUTING but the job had status COMPLETED. It looks like what happened is
that the api-orch service on gw56 was shutdown probably at the same time that
the orchestrator was handling the COMPLETED process status message. The
process status subscriber [automatically acks
messages|https://github.com/apache/airavata/blob/3f29cfdbd71de18777557713dce58007a3cbc2f5/modules/messaging/core/src/main/java/org/apache/airavata/messaging/core/MessagingFactory.java#L120]
so it was taken out of the queue and not available when the orchestrator was
restarted.
In gfac's log, the process completes at 2017-02-17 13:41:01
{noformat}
2017-02-17 13:41:01 [pool-9-thread-11] INFO o.a.a.g.core.context.ProcessContext
- expId: Clone_of_2M_data_82c732b8-5bd5-4e24-b1cc-ce3fd480d677, processId:
PROCESS_3b22553a-b9ed-4250-a1dd-8b555ecede80 :- Process status changed
OUTPUT_DATA_S
{noformat}
api-orch was shut down and restarted several times around the same time
{noformat}
2017-02-17 13:37:03 [main] INFO o.a.a.api.server.AiravataAPIServer - API server
started over TLS on Port: 9930 ...
...
2017-02-17 13:40:23 [main] INFO o.a.a.api.server.AiravataAPIServer - API server
started over TLS on Port: 9930 ...
...
2017-02-17 13:43:02 [main] INFO o.a.a.api.server.AiravataAPIServer - API server
started over TLS on Port: 9930 ...
...
2017-02-17 13:48:23 [main] INFO o.a.a.api.server.AiravataAPIServer - API server
started over TLS on Port: 9930 ...
...
2017-02-17 14:10:58 [main] INFO o.a.a.api.server.AiravataAPIServer - API server
started over TLS on Port: 9930 ...
{noformat}
A couple of solution ideas:
* make the status queue subscribe set to acknowledge messages
* have the orchestrator check the process status in the registry for every
incomplete experiment when it starts up
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)