Hi all,

This is a follow up discussion of what we had on the “Current Database
remodeling of Apache Airavata” [1], followed by a successful Google on air
hangout. [2]

Prevailing Registry ( Registry CPI)

This is the current Data Model design of the Registry. [3]

Currently we use a MySQL database abstracted by an OpenJPA layer.

The Registry contains

   -

   Experiments related data ( refer to the Data Model design : [3])
   includes experiment,application,node level statuses,errors, Scheduling
   and QOS( user provided) information, Inputs and outputs of each
   experiment,node and the application.
   -

   Application Catalogs
   contains the descriptors( host, application, service)
   -

   Gateway Information
   -

   User information ( mostly admin users of the gateways)


Problems faced


Note: We haven’t done any performance testing on the registry or even
included the current registry in a release yet.


   -

   Application Catalogs ( Descriptors)
   the current version of the Application Catalogs are used as XML files.
   We are storing them in the registry as blobs →  we cannot query them.
   -

   Data Model Changes
   The data models are highly hierarchical. This causes a lot of problems
   when the Data Models need to be changed.
   Data Models are expected to change heavily within the development
   phases(0.12,0.13…) until we settle on a concrete solution for the
   production release (1.0).
   To accommodate even the small changes of the model, we need to go
   through several levels of costly code level changes due to the current
   implementation.
   It can be costly since the Data Models keep changing all the time.
   -

   Hierarchy causing overhead
   Since the whole current data model is hierarchical,  there is a
   significant overhead in retrieving data.
   ex: To get the an Experiment, you need make multiple queries from bottom
   up ( from the job level to the experiment level ( job → task → node →
   workflow → experiment) ) to get the whole Experiment.


Use cases

Here are some typical queries Airavata should support ( with respect to the
gateways that are being integrated with Airavata)
Some gateways use workflows while the others use single job submission.


   -

   *ParamChem* ( Workflow oriented)
   -

      get the data of each node ( of the Workflow)
      -

         inputs
         -

         outputs
         -

         status
         -

      get updated node data since last retrieval (wish)



   -

   *CIPRES* ( Single Job Submission)
   -

      get Experiment Summary
      -

         metadata
         -

         statistics
         -

            inputs
            -

            parameters
            -

            intermediate data
            -

         progress


   -

   Clone an existing experiment ( with either different descriptors or
   inputs)
   -

   Store output files ( wish)



   -

   *UltraScan* ( Single Job)
   -

      get Job level status ( Gfac level status) ( it’s the second lowest
      level of statuses, refer to the Data Model Design [3])
      -

      get Application Level Statuses ( The ultraScan application issues
      statuses, we need to get them to the user)
      -

      get Output data
      -

   *CyberGateway *(Single Job Submission)
   -

      get Summary of all Experiments
      -

         metadata
         -

         status
         -

         progress


Requirements/Suggestions

   - Here are the Data Persistence Requirements [4]
   -

   Application Catalog
   proper way and a place to store the application catalogs so that it can
   be queriable



   -

   Meta-Data Catalog
   Our Data Model is highly hierarchical.
   Since the Data Models will keep changing in the development phase (
   until a production release) , we need to come up with a way to make it
   facilitate the hierarchical changes
   -

   Separate out the registry, Data Store, Provenance ...etc


Wish List

   -

   File Management
   Meta Data extraction from large files and store them


Special Thanks to Saminda for creating the Data Persistent requirements
document and the whole Airavata team for helping out on this analysis.

[1]
http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjohjmsd+state:results

[2]
<http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjohjmsd+state:results>
http://www.youtube.com/watch?v=EY6oPwqi1g4

[<http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjohjmsd+state:results>3]
https://github.com/apache/airavata/tree/master/airavata-api
[4]
https://docs.google.com/document/d/1yhUlwq5Q3WNMAan3cdpKYVT2AJsIL3VAEicdRilskRw

-- 
Thanks,
Sachith Withana

Reply via email to