Hi all,
This is a follow up discussion of what we had on the “Current Database
remodeling of Apache Airavata” [1], followed by a successful Google on air
hangout. [2]
Prevailing Registry ( Registry CPI)
This is the current Data Model design of the Registry. [3]
Currently we use a MySQL database abstracted by an OpenJPA layer.
The Registry contains
-
Experiments related data ( refer to the Data Model design : [3])
includes experiment,application,node level statuses,errors, Scheduling
and QOS( user provided) information, Inputs and outputs of each
experiment,node and the application.
-
Application Catalogs
contains the descriptors( host, application, service)
-
Gateway Information
-
User information ( mostly admin users of the gateways)
Problems faced
Note: We haven’t done any performance testing on the registry or even
included the current registry in a release yet.
-
Application Catalogs ( Descriptors)
the current version of the Application Catalogs are used as XML files.
We are storing them in the registry as blobs → we cannot query them.
-
Data Model Changes
The data models are highly hierarchical. This causes a lot of problems
when the Data Models need to be changed.
Data Models are expected to change heavily within the development
phases(0.12,0.13…) until we settle on a concrete solution for the
production release (1.0).
To accommodate even the small changes of the model, we need to go
through several levels of costly code level changes due to the current
implementation.
It can be costly since the Data Models keep changing all the time.
-
Hierarchy causing overhead
Since the whole current data model is hierarchical, there is a
significant overhead in retrieving data.
ex: To get the an Experiment, you need make multiple queries from bottom
up ( from the job level to the experiment level ( job → task → node →
workflow → experiment) ) to get the whole Experiment.
Use cases
Here are some typical queries Airavata should support ( with respect to the
gateways that are being integrated with Airavata)
Some gateways use workflows while the others use single job submission.
-
*ParamChem* ( Workflow oriented)
-
get the data of each node ( of the Workflow)
-
inputs
-
outputs
-
status
-
get updated node data since last retrieval (wish)
-
*CIPRES* ( Single Job Submission)
-
get Experiment Summary
-
metadata
-
statistics
-
inputs
-
parameters
-
intermediate data
-
progress
-
Clone an existing experiment ( with either different descriptors or
inputs)
-
Store output files ( wish)
-
*UltraScan* ( Single Job)
-
get Job level status ( Gfac level status) ( it’s the second lowest
level of statuses, refer to the Data Model Design [3])
-
get Application Level Statuses ( The ultraScan application issues
statuses, we need to get them to the user)
-
get Output data
-
*CyberGateway *(Single Job Submission)
-
get Summary of all Experiments
-
metadata
-
status
-
progress
Requirements/Suggestions
- Here are the Data Persistence Requirements [4]
-
Application Catalog
proper way and a place to store the application catalogs so that it can
be queriable
-
Meta-Data Catalog
Our Data Model is highly hierarchical.
Since the Data Models will keep changing in the development phase (
until a production release) , we need to come up with a way to make it
facilitate the hierarchical changes
-
Separate out the registry, Data Store, Provenance ...etc
Wish List
-
File Management
Meta Data extraction from large files and store them
Special Thanks to Saminda for creating the Data Persistent requirements
document and the whole Airavata team for helping out on this analysis.
[1]
http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjohjmsd+state:results
[2]
<http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjohjmsd+state:results>
http://www.youtube.com/watch?v=EY6oPwqi1g4
[<http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjohjmsd+state:results>3]
https://github.com/apache/airavata/tree/master/airavata-api
[4]
https://docs.google.com/document/d/1yhUlwq5Q3WNMAan3cdpKYVT2AJsIL3VAEicdRilskRw
--
Thanks,
Sachith Withana