Re: Improvements to Experiment input data model in order to support Gaussian application

Marlon Pierce Wed, 10 Dec 2014 06:18:26 -0800

+1 for more generalization.

We are collecting more raw material for chemistry application use casesat https://cwiki.apache.org/confluence/display/AIRAVATA/Use+Cases. We'llreview them (and bio apps that we also collected previously) in a wikidocument to see if our API mappings are correct.

Preliminarily, we see the command line arguments don't contain the fulllist of input and output files. Additional required inputs may bepassed via control files, environment variables, etc. Examples includedata libraries for basis functions, names of checkpoint files, names ofoutput files, and so forth. So we need a way to say the application maytake 4 inputs, but only 1 is needed to construct a valid command line,for example.

On the other hand, I don't think we need the InputMetadataType thatChathuri introduces below. This overlaps with what is already in thecompute resource description fields.



Marlon

On 12/8/14, 10:17 PM, Amila Jayasekara wrote:

Hi Chathuri,

I do not know anything about Gaussian. So its kind of hard for me to
understand what exactly is the meaning of the structures you introduced and
why you exactly need those structures.

A more important question is how to come up with a more abstract and
generic thrift IDLS so that you dont need to change it every time we add a
new application. Going through many example applications is certainly a
good way to understand broad requirements and helps to abstract out many
features.

Thanks
-Thejaka

On Mon, Dec 8, 2014 at 10:22 AM, Chathuri Wimalasena <[email protected]>
wrote:

Hi Devs,

We are trying to add Gaussian application using airavata-appcatalog. While
doing that, we face some limitations of the current design.

In Gaussian there are several input files, some input files should used
when the job run command is generated, but some does not.  Those which are
not involved with job run command also need to be staged to working
directory. Such flags are not supported in current design.

Another interesting feature that in Gaussian is, in input file, we can
specify the values for memory, cpu like options. If input file includes
those parameters, we need to give priority to those values instead of the
values specified in the request.

To support these features, we need to slightly modify our thrift IDLS,
specially to InputDataObjectType struct.

Current struct is below.

struct InputDataObjectType {
     1: required string name,
     2: optional string value,
     3: optional DataType type,
     4: optional string applicationArgument,
     5: optional bool standardInput = 0,
     6: optional string userFriendlyDescription,
     7: optional string metaData
}

In order to support 1st requirement, we introduce 2 enums.

enum InputValidityType{
REQUIRED,
OPTIONAL
}

enum CommandLineType{
INCLUSIVE,
EXCLUSIVE
}

Please excuse me for names. You are welcome to suggest better names.

To support 2nd requirement, we change metaData field to a map with another
enum where we define all the metadata types that can have.

enum InputMetadataType {
     MEMORY,
     CPU
}

So the new InputDataObjectType would be as below.

struct InputDataObjectType {
     1: required string name,
     2: optional string value,
     3: optional DataType type,
     4: optional string applicationArgument,
     5: optional bool standardInput = 0,
     6: optional string userFriendlyDescription,
   *  7: optional map<InputMetadataType, string> metaData,*
*    8: optional InputValidityType inputValid;*
*    9: optional CommandLineType addedToCommandLine;*
*    10: optional bool dataStaged = 0;*
}

Suggestions are welcome.

Thanks,
Chathuri

Re: Improvements to Experiment input data model in order to support Gaussian application

Reply via email to