Hi All,

I am trying to revisit the Airavata support for all command line options we 
pass to applications. Airavata's goal is to make end users oblivious to any 
application execution details, but application service providers need 
flexibility to configure all possible application options. 

Some terminology like arguments vs parameters vs attributes get ambiguous. They 
differ by definition but in practice they are often used interchangeably. For 
Airavata, we should avoid a confusion between whats exposed in wsdl's vs whats 
passed to application. This matches the semantics as well, for instance, an 
argument is an instance of parameter. This discussion is about what Airavata 
passes to the command line applications. I am not suggesting any changes to 
wsdl's and schemas which use xml definitions. For applications I am suggesting 
to use the terminology per POSIX standard definitions [1]. I also propose that 
we should try and follow the utility syntax guidelines [2]. If an application 
does not follow these guidelines, we suggest it be wrapped by a shell script so 
we can pass arguments and flags confirming to standard practices.

Application refers to the commands airavata executes on computational resources.

Working directory. Airavata should insist on executing each invocation in a 
unique working directory. Some applications try and change to a static 
directory, but if proper uniqueness is not followed for output and log files, 
we risk overwriting executions producing unintended outputs. Also, avoid 
writing to home directories and source directories. This might have side 
effects and a overrun log file might fill the disk space and freeze further 
usage of that account.  

Arguments: 
*  should support application arguments and provide a way to specify both 
required and optional. 
In the case of optional parameters, the resulting wsdl's attributes should have 
minOccurs=0 and airavata should skip passing that value to application (if not 
specified).

* Airavata *should not* support arguments with operands followed by commands. 
These additional commands get forked without having control over the process id 
and monitoring and exit status of these series of commands gets tricky. More 
over, the underlying grid job managers do not like treating a chain of commands 
as one executable. Rather encourage explicitly specifying the execution chain 
and associated I/O.

* Airavata should also support flags only ( they serve different purpose than 
option flags). Flags normally prefix with '--'. These flags control the 
execution of the application like --verbose, --fast, --use-fft, e.t.c

* Arguments can be passed to the application as standardinput (with redirector 
operator) or as name-value pairs or with option flags. The option flags should 
always prefix with the POSIX standard of '-'. 

* If the arguments are preceded by an option flag they do not need to be 
ordered. But if the arguments are passed just as values, applications are 
sensitive to the order the arguments are passed. In this case, optional 
arguments have to carefully handled, as missing an argument in between will 
mislead. 

* If an argument is a file type, and if the file has a remote supported 
protocols of (http, ftp, gsiftp, s3) then the file has to be staged first and 
only local path passed to the application. Application should be able to 
consume the full local path and if only basename is required, it should be able 
to handle it internally. 

* If an application requires a remove ftp url as an argument, then it should be 
specified as a string, in which case Airavata will skip staging that url and 
will pass the url as is to the application. 

* Implicit Parameters: As much as possible, Airavata should insist on 
one-on-one match between inputs specified in service description to whats 
passed to application. But there will be exceptions like fortran applications 
which uses NAMELIST standard to specify all inputs in a config file and pass 
only this file to the application. In these cases, the application still needs 
to stage some data files to the remote compute server but these file names or 
implicitly specified in the application. The application typically looks for 
these files relative to working directory or to input namelist file. 

Outputs:
* Airavata should support standard outputs and errors and optionally provide a 
way to specify the names of stdout and stderr. 
* All outputs required to be staged out of the compute machine or scratch 
working directory be explicitly specified. 
* If the output file name(s) are predetermined or specified at in a config 
file, then the name should be specified in application description. In the 
cases, where output file names are not deterministic, a regular expression or a 
containing directory should be specified. 
* If the application requires the output file name be passed at command line 
like -out output.txt, then airavata should provide support for these outputs 
flags. 
* Airavata should support outputs which can be optionally produced. If an 
optional output is not generated but application exits with exit code 0, then 
the application should be marked as success. (A different discussion on 
application execution success criteria is needed). 
* A default output data directory should be created on the remote compute 
resource. The application description should be able to specific an overriding 
name for this directory. 
* Airavata should support applications/shell script wrappers which print 
name-value pairs of output content or file paths to standard out. 

Once we discuss this topic, we should raise JIRAs for any missing features and 
also add these on website/wiki. 

Cheers,
Suresh

[1] - http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html
[2] - 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_02


Reply via email to