Eroma created AIRAVATA-2378:
-------------------------------

             Summary: Jobs failing at execution of squeue command due to 
response of 'Invalid job ID'
                 Key: AIRAVATA-2378
                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2378
             Project: Airavata
          Issue Type: Bug
          Components: Airavata System, PGA PHP Web Gateway
    Affects Versions: 0.17
         Environment: https://ultrascan.scigap.org/
            Reporter: Eroma
            Assignee: Suresh Marru
             Fix For: 0.17


When the job is submitted and a job ID is returned fro the cluster, gfac 
executes squeue command. When this command returns queued job details gfac goes 
and executes gateway user details to XSEDE machines and also adds the job ID to 
monitoring map.

In intermittent cases, the SSH session validation takes longer after the job 
submission and then by the time squeue command is executed the job is no longer 
in the queue hence error returned [1]


[1]
2017-05-02 06:27:48,047 [pool-7-thread-15] ERROR 
o.a.a.g.i.t.DefaultJobSubmissionTask 
process_id=PROCESS_c7e404ed-0822-404a-8f04-6b09e9ba8ece, 
token_id=75918c63-30fd-4548-a8d3-7f3a41b185ae, 
experiment_id=US3-AIRA_740b0ad6-62c4-42dc-9eed-f12b92a6b98b, 
gateway_id=Ultrascan_Production - Error occurred while submitting the job
org.apache.airavata.gfac.core.GFacException: Error running command squeue -j 
9119082  on remote cluster. StandardError: slurm_load_jobs error: Invalid job 
id specified

        at 
org.apache.airavata.gfac.impl.HPCRemoteCluster.throwExceptionOnError(HPCRemoteCluster.java:298)
        at 
org.apache.airavata.gfac.impl.HPCRemoteCluster.getJobStatus(HPCRemoteCluster.java:233)
        at 
org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.verifyJobSubmissionByJobId(DefaultJobSubmissionTask.java:302)
        at 
org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.execute(DefaultJobSubmissionTask.java:157)
        at 
org.apache.airavata.gfac.impl.GFacEngineImpl.executeTask(GFacEngineImpl.java:814)
        at 
org.apache.airavata.gfac.impl.GFacEngineImpl.executeJobSubmission(GFacEngineImpl.java:510)
        at 
org.apache.airavata.gfac.impl.GFacEngineImpl.executeTaskListFrom(GFacEngineImpl.java:386)
        at 
org.apache.airavata.gfac.impl.GFacEngineImpl.executeProcess(GFacEngineImpl.java:286)
        at 
org.apache.airavata.gfac.impl.GFacWorker.executeProcess(GFacWorker.java:227)
        at org.apache.airavata.gfac.impl.GFacWorker.run(GFacWorker.java:86)
        at 
org.apache.airavata.common.logging.MDCUtil.lambda$wrapWithMDC$0(MDCUtil.java:40)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to