Re: Improvements to Experiment input data model in order to support Gaussian application

Marlon Pierce Wed, 10 Dec 2014 06:04:29 -0800

We have several use cases [1] of codes (not just Gaussian) that take asingle input file, but this input file may modify or override the memoryand CPU requirements that may be specified in other parts of the APIcall. It may also specify the names and locations of other input andoutput files (such as checkpoint files). These input files follow theapplication's input format specification; they aren't resource-suppliedhelper scripts (that is a different consideration). The applicationdepends on the information in these input files, so if the PBS/SLURM/etcscript specifies incompatible values, the code will crash.

So we have to map this use case to the API and implementation in orderto generate correct job execution scripts. It is hard to capture inthe API directly, but we can use a piece of code that handles specificapplication input file formats that inspects user-provided files andmodifies the experiment data as appropriate (changing the memory, namesand numbers of inputs and outputs, etc). These little pieces of codesneed to be application specific and in a plugin place.

Two candidate places for this to happen (there may be others) are a) inthe validation step in the orchestrator, and b) in the a GFAC handler.I don't have a strong argument for one of these over the other. Otherrecommendations?


Marlon

[1] https://cwiki.apache.org/confluence/display/AIRAVATA/Use+Cases

On 12/9/14, 11:03 AM, Shameera Rathnayaka wrote:

Hi Suresh,

Gaussian input file can provide environment requirement with the main input
file. For an example, %nprocshared which has the user defined process count
for that experiment and %mem  which provide the memory it require. There
are lots of command that user can provide with input file. In gaussian
input handler we need to parse this config lines and read the values set to
JobExecutionContext. This config template is specific to the Gaussian
application. Another application may have another set of configurations.
So this handler will be specific to the gaussian.

Thanks,
Shameera.

On Tue, Dec 9, 2014 at 8:18 AM, Suresh Marru <[email protected]> wrote:

Hi Shameera,

Can you please describe what this gaussian specific handler supposed to
do? Anything more beyond reading or editing the input file?

Suresh


On 09-Dec-2014, at 1:26 am, Shameera Rathnayaka <[email protected]>
wrote:

Hi All,

I am writing a new handler which is gaussian specific. I checked for a
location to put this handler code in the airavata main source code , but it
seems all handlers we have in airavata is bundle with particular provider.
Hence I was thinking to create a new project to put this code. But after
having offline chat with Marlon, decided to put this to the airavata main
source code because other developers also can works with this gaussian
handlers. So i am going to create a new module under gfac, named
"gfac-application-specific-handlers" (if you have any good suggestions
please reply) to keep all application specific handlers. When we fully
integrated gridchem applications we may end up few more application
specific handlers and those will go under this new module. WDYT?

Thanks,
Shameera.

On Mon, Dec 8, 2014 at 12:20 PM, Marlon Pierce <[email protected]> wrote:

That would be great. Please upload them to the Wiki.

Marlon


On 12/8/14, 11:59 AM, Pamidighantam, Sudhakar V wrote:

I would suggest that we look at several quantum chemistry applications
which have slight variations on the theme.  We have NWChem, Gamess, and
Molpro
examples to look at. I can send some input files and/or have a session
to go over the relevant sections. We can do this later today.

Thanks,
Sudhakar.


On Dec 8, 2014, at 10:23 AM, Marlon Pierce <[email protected]> wrote:

  The more examples, the better.  I'd like to find the right balance

between understanding the problem space and making incremental progress.

Marlon

On 12/8/14, 10:38 AM, Pamidighantam, Sudhakar V wrote:

Chaturi:
Thanks for these suggestions. One question I have is whether we should
look at some of the input files in the set of applications currently under
testing to come up with these requirements.
There may be additional requirements in some of the inputs. Of course
we can incrementally update the data structures as well as we test these
applications in more depth. But I feel some significant number of
application cases should be accommodated with each update. We may target
these for rc 0.15 and depending on the time available  we can look at at
least few more applications.

Comments?

Thanks,
Sudhakar.
On Dec 8, 2014, at 9:22 AM, Chathuri Wimalasena <[email protected]
<mailto:[email protected]>> wrote:

Hi Devs,

We are trying to add Gaussian application using airavata-appcatalog.
While doing that, we face some limitations of the current design.

In Gaussian there are several input files, some input files should
used when the job run command is generated, but some does not.  Those which
are not involved with job run command also need to be staged to working
directory. Such flags are not supported in current design.

Another interesting feature that in Gaussian is, in input file, we can
specify the values for memory, cpu like options. If input file includes
those parameters, we need to give priority to those values instead of the
values specified in the request.

To support these features, we need to slightly modify our thrift IDLS,
specially to InputDataObjectType struct.

Current struct is below.

struct InputDataObjectType {
      1: required string name,
      2: optional string value,
      3: optional DataType type,
      4: optional string applicationArgument,
      5: optional bool standardInput = 0,
      6: optional string userFriendlyDescription,
      7: optional string metaData
}

In order to support 1st requirement, we introduce 2 enums.

enum InputValidityType{
REQUIRED,
OPTIONAL
}

enum CommandLineType{
INCLUSIVE,
EXCLUSIVE
}

Please excuse me for names. You are welcome to suggest better names.

To support 2nd requirement, we change metaData field to a map with
another enum where we define all the metadata types that can have.

enum InputMetadataType {
      MEMORY,
      CPU
}

So the new InputDataObjectType would be as below.

struct InputDataObjectType {
      1: required string name,
      2: optional string value,
      3: optional DataType type,
      4: optional string applicationArgument,
      5: optional bool standardInput = 0,
      6: optional string userFriendlyDescription,
      7: optional map<InputMetadataType, string> metaData,
      8: optional InputValidityType inputValid;
      9: optional CommandLineType addedToCommandLine;
      10: optional bool dataStaged = 0;
}

Suggestions are welcome.

Thanks,
Chathuri

Re: Improvements to Experiment input data model in order to support Gaussian application

Reply via email to