Marcus Christie created AIRAVATA-2741:

             Summary: Ideas for better way to deal with arbitrary output files 
                 Key: AIRAVATA-2741
             Project: Airavata
          Issue Type: Improvement
            Reporter: Marcus Christie
            Assignee: Marcus Christie

Just want to capture some details of recent conversations with [~eroma_a] and 
[~spamidig] on how to improve Airavata capabilities so we can move beyond using 
ARCHIVE.  The ARCHIVE capability is a bit of a hack and causes some issues for 
us. Just briefly, here are some of the problems:
* pulls back absolutely every file but some aren't needed and some intermediate 
files are very large. For some applications it isn't even practical to use 
* pulls back duplicates of Application Output files, further filling gateway 
data storage
* these files are basically opaque to Airavata, so there is a limit on what can 
be done in a programmatic way for some of these files

Here are some potential improvements:
* improve wildcard support: allow specifying a wildcard that can match a single 
or multiple files. For multiple files these can all be registered as a 
URI_COLLECTION type data output.
* Show all of the job files in the portal, including ones that aren't defined 
as Application Outputs and haven't actually been staged back to the portal, and 
allow the user to request pulling back one of these other files. This would be 
nice because there are certainly going to be cases where a file is generated 
that wasn't anticipated (either lack of configuration or just something truly 
not anticipatable). Would mean needing to register every file in the job 
directory, not just the Application Outputs (not sure where, replica catalog?). 
Would also mean we need backend task execution support for fetching these files 
as needed.

This message was sent by Atlassian JIRA

Reply via email to