[ 
https://issues.apache.org/jira/browse/AIRAVATA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lahiru Jayathilake updated AIRAVATA-3893:
-----------------------------------------
    Summary: Support automated HPC Job Re-Submission across Clusters after 
HPC-Side failures  (was: Support for Automatic Resubmission of Failed Jobs 
After Successful Submission)

> Support automated HPC Job Re-Submission across Clusters after HPC-Side 
> failures
> -------------------------------------------------------------------------------
>
>                 Key: AIRAVATA-3893
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-3893
>             Project: Airavata
>          Issue Type: Improvement
>          Components: Airavata System
>            Reporter: Lahiru Jayathilake
>            Priority: Major
>
> Currently, the Airavata Metascheduler does not have the capability to 
> automatically resubmit jobs to other clusters if the job has been 
> successfully submitted but fails during execution (e.g., due to resource 
> allocation issues).
> This feature request aims to enhance the Metascheduler by introducing the 
> ability to handle such job failures more effectively. The Metascheduler 
> should automatically attempt to resubmit failed jobs to other configured 
> clusters, ensuring more reliable completion of experiments.
> This enhancement will improve the system’s robustness in handling transient 
> failures or resource constraints across multiple clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to