Eroma created AIRAVATA-2941:
-------------------------------
Summary: Experiments fail to submit jobs to HPC cluster queues due
to queue reaching the max job limit per user.
Key: AIRAVATA-2941
URL: https://issues.apache.org/jira/browse/AIRAVATA-2941
Project: Airavata
Issue Type: Bug
Components: GFac, helix implementation
Affects Versions: 0.18
Environment: https://staging.ultrascan.scigap.org &
https://ultrascan.scigap.org/
Reporter: Eroma
Assignee: Dimuthu Upeksha
Fix For: 0.18
Currently experiments fail when
# HPC queue reaches the max job number for the queue.
# When the job submission fails and HPC sent job submission response
[1]airavata tags the experiment as FAILED.
# The only option for gateway user is to submit the experiment again.
Fix required is to Airavata to have internal queues or a way to manage such
experiments until the HPC queue is available for jobs and not to FAIL the
experiment.
[1]
This example os from stampede2
----------------------------------------------------------------- Welcome to
the Stampede2 Supercomputer
----------------------------------------------------------------- No
reservation for this job --> Verifying valid submit host (login3)...OK -->
Verifying valid jobname...OK --> Enforcing max jobs per user...FAILED [*] Too
many simultaneous jobs in queue. --> Max job limits for us3 = 50 jobs
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)