Patches are welcome...it will also help us make sure we are communicating the use cases.
Marlon On 3/6/14 11:52 PM, Mattmann, Chris A (3980) wrote: > Great description of the use cases. > > Hate to sound like a broken record here, but the Apache OODT file manager > along with its integration with Apache Lucene/Solr, and Tika and a number > of other technologies I think totally fits the need here. > > I realize that along with that I should put my money where my mouth > is with the old "patches welcome" moniker :) Maybe I will.. :) > > Just my 2c. > > Cheers, > Chris > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-283, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Sachith Withana <[email protected]> > Reply-To: "[email protected]" > <[email protected]> > Date: Thursday, March 6, 2014 8:28 PM > To: "[email protected]" <[email protected]> > Subject: Airavata Data Management Challenges > >> Hi all, >> >> This is a follow up discussion of what we had on the “Current Database >> remodeling of Apache Airavata” [1], followed by a successful Google on air >> hangout. [2] >> >> Prevailing Registry ( Registry CPI) >> >> This is the current Data Model design of the Registry. [3] >> >> Currently we use a MySQL database abstracted by an OpenJPA layer. >> >> The Registry contains >> >> - >> >> Experiments related data ( refer to the Data Model design : [3]) >> includes experiment,application,node level statuses,errors, Scheduling >> and QOS( user provided) information, Inputs and outputs of each >> experiment,node and the application. >> - >> >> Application Catalogs >> contains the descriptors( host, application, service) >> - >> >> Gateway Information >> - >> >> User information ( mostly admin users of the gateways) >> >> >> Problems faced >> >> >> Note: We haven’t done any performance testing on the registry or even >> included the current registry in a release yet. >> >> >> - >> >> Application Catalogs ( Descriptors) >> the current version of the Application Catalogs are used as XML files. >> We are storing them in the registry as blobs → we cannot query them. >> - >> >> Data Model Changes >> The data models are highly hierarchical. This causes a lot of problems >> when the Data Models need to be changed. >> Data Models are expected to change heavily within the development >> phases(0.12,0.13…) until we settle on a concrete solution for the >> production release (1.0). >> To accommodate even the small changes of the model, we need to go >> through several levels of costly code level changes due to the current >> implementation. >> It can be costly since the Data Models keep changing all the time. >> - >> >> Hierarchy causing overhead >> Since the whole current data model is hierarchical, there is a >> significant overhead in retrieving data. >> ex: To get the an Experiment, you need make multiple queries from >> bottom >> up ( from the job level to the experiment level ( job → task → node → >> workflow → experiment) ) to get the whole Experiment. >> >> >> Use cases >> >> Here are some typical queries Airavata should support ( with respect to >> the >> gateways that are being integrated with Airavata) >> Some gateways use workflows while the others use single job submission. >> >> >> - >> >> *ParamChem* ( Workflow oriented) >> - >> >> get the data of each node ( of the Workflow) >> - >> >> inputs >> - >> >> outputs >> - >> >> status >> - >> >> get updated node data since last retrieval (wish) >> >> >> >> - >> >> *CIPRES* ( Single Job Submission) >> - >> >> get Experiment Summary >> - >> >> metadata >> - >> >> statistics >> - >> >> inputs >> - >> >> parameters >> - >> >> intermediate data >> - >> >> progress >> >> >> - >> >> Clone an existing experiment ( with either different descriptors or >> inputs) >> - >> >> Store output files ( wish) >> >> >> >> - >> >> *UltraScan* ( Single Job) >> - >> >> get Job level status ( Gfac level status) ( it’s the second lowest >> level of statuses, refer to the Data Model Design [3]) >> - >> >> get Application Level Statuses ( The ultraScan application issues >> statuses, we need to get them to the user) >> - >> >> get Output data >> - >> >> *CyberGateway *(Single Job Submission) >> - >> >> get Summary of all Experiments >> - >> >> metadata >> - >> >> status >> - >> >> progress >> >> >> Requirements/Suggestions >> >> - Here are the Data Persistence Requirements [4] >> - >> >> Application Catalog >> proper way and a place to store the application catalogs so that it can >> be queriable >> >> >> >> - >> >> Meta-Data Catalog >> Our Data Model is highly hierarchical. >> Since the Data Models will keep changing in the development phase ( >> until a production release) , we need to come up with a way to make it >> facilitate the hierarchical changes >> - >> >> Separate out the registry, Data Store, Provenance ...etc >> >> >> Wish List >> >> - >> >> File Management >> Meta Data extraction from large files and store them >> >> >> Special Thanks to Saminda for creating the Data Persistent requirements >> document and the whole Airavata team for helping out on this analysis. >> >> [1] >> http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjoh >> jmsd+state:results >> >> [2] >> <http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktjo >> hjmsd+state:results> >> http://www.youtube.com/watch?v=EY6oPwqi1g4 >> >> [<http://markmail.org/thread/33bwjmgs75um46uc#query:+page:1+mid:4lguliiktj >> ohjmsd+state:results>3] >> https://github.com/apache/airavata/tree/master/airavata-api >> [4] >> https://docs.google.com/document/d/1yhUlwq5Q3WNMAan3cdpKYVT2AJsIL3VAEicdRi >> lskRw >> >> -- >> Thanks, >> Sachith Withana
