Eroma created AIRAVATA-2378:
-------------------------------
Summary: Jobs failing at execution of squeue command due to
response of 'Invalid job ID'
Key: AIRAVATA-2378
URL: https://issues.apache.org/jira/browse/AIRAVATA-2378
Project: Airavata
Issue Type: Bug
Components: Airavata System, PGA PHP Web Gateway
Affects Versions: 0.17
Environment: https://ultrascan.scigap.org/
Reporter: Eroma
Assignee: Suresh Marru
Fix For: 0.17
When the job is submitted and a job ID is returned fro the cluster, gfac
executes squeue command. When this command returns queued job details gfac goes
and executes gateway user details to XSEDE machines and also adds the job ID to
monitoring map.
In intermittent cases, the SSH session validation takes longer after the job
submission and then by the time squeue command is executed the job is no longer
in the queue hence error returned [1]
[1]
2017-05-02 06:27:48,047 [pool-7-thread-15] ERROR
o.a.a.g.i.t.DefaultJobSubmissionTask
process_id=PROCESS_c7e404ed-0822-404a-8f04-6b09e9ba8ece,
token_id=75918c63-30fd-4548-a8d3-7f3a41b185ae,
experiment_id=US3-AIRA_740b0ad6-62c4-42dc-9eed-f12b92a6b98b,
gateway_id=Ultrascan_Production - Error occurred while submitting the job
org.apache.airavata.gfac.core.GFacException: Error running command squeue -j
9119082 on remote cluster. StandardError: slurm_load_jobs error: Invalid job
id specified
at
org.apache.airavata.gfac.impl.HPCRemoteCluster.throwExceptionOnError(HPCRemoteCluster.java:298)
at
org.apache.airavata.gfac.impl.HPCRemoteCluster.getJobStatus(HPCRemoteCluster.java:233)
at
org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.verifyJobSubmissionByJobId(DefaultJobSubmissionTask.java:302)
at
org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.execute(DefaultJobSubmissionTask.java:157)
at
org.apache.airavata.gfac.impl.GFacEngineImpl.executeTask(GFacEngineImpl.java:814)
at
org.apache.airavata.gfac.impl.GFacEngineImpl.executeJobSubmission(GFacEngineImpl.java:510)
at
org.apache.airavata.gfac.impl.GFacEngineImpl.executeTaskListFrom(GFacEngineImpl.java:386)
at
org.apache.airavata.gfac.impl.GFacEngineImpl.executeProcess(GFacEngineImpl.java:286)
at
org.apache.airavata.gfac.impl.GFacWorker.executeProcess(GFacWorker.java:227)
at org.apache.airavata.gfac.impl.GFacWorker.run(GFacWorker.java:86)
at
org.apache.airavata.common.logging.MDCUtil.lambda$wrapWithMDC$0(MDCUtil.java:40)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)