Eroma created AIRAVATA-2941:
-------------------------------

             Summary: Experiments fail to submit jobs to HPC cluster queues due 
to queue reaching the max job limit per user.
                 Key: AIRAVATA-2941
                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2941
             Project: Airavata
          Issue Type: Bug
          Components: GFac, helix implementation
    Affects Versions: 0.18
         Environment: https://staging.ultrascan.scigap.org & 
https://ultrascan.scigap.org/ 
            Reporter: Eroma
            Assignee: Dimuthu Upeksha
             Fix For: 0.18


Currently experiments fail when
 # HPC queue reaches the max job number for the queue.
 # When the job submission fails and HPC sent job submission response 
[1]airavata tags the experiment as FAILED.
 # The only option for gateway user is to submit the experiment again.

Fix required is to Airavata to have internal queues or a way to manage such 
experiments until the HPC queue is available for jobs and not to FAIL the 
experiment.

 

[1]

This example os from stampede2

----------------------------------------------------------------- Welcome to 
the Stampede2 Supercomputer 
----------------------------------------------------------------- No 
reservation for this job --> Verifying valid submit host (login3)...OK --> 
Verifying valid jobname...OK --> Enforcing max jobs per user...FAILED [*] Too 
many simultaneous jobs in queue. --> Max job limits for us3 = 50 jobs

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to