Re: Object Database Suggestions for Airavata Registry

K Yoshimoto Thu, 27 Feb 2014 10:10:35 -0800

I happened to look through the data model.  

> > https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD


How is information on the input data and transfer method stored?
Is that freeform text in DataTransferDetails?

Also, is there a place to describe preprocessin of job input data
or a custom submit command?

Sorry for the sidetrack.

Kenneth

On Wed, Feb 26, 2014 at 02:13:46AM +0530, Shameera Rathnayaka wrote:
> Hi all,
> 
> Just thinking a loud here, sorry if i am moving this thread to another
> direction.
> 
> If we going to use our own registry implementation, do we have consider
> provide database layer where we can plug different kind of databases?(may
> be Supun also suggesting the same in his previous reply). As we are already
> separating SPIs and APIs for other components, we can do the same for DB
> implementation too. NoSql  database like cassandra also have cql driver
> which is identical to Mysql driver. So it is not difficult to implement
> plugable environment,
> 
> In wso2 registry they already have above capability but not yet implemented
> CQL as i know.
> 
> Thanks,
> Shameera.
> 
> 
> On Wed, Feb 26, 2014 at 1:36 AM, Saminda Wijeratne <[email protected]>wrote:
> 
> > Sorry I missed the arrow from Registry to Orchestrator. Thanks for pointing
> > it out Marlon. Updated the arrows and added a legend.
> >
> > Broken line arrow is involved in MessageBox component where it gets
> > triggered from time to time without external user intervention. Also
> > there's still some technical details we need to figure-out on how the
> > MessageBox will function and expose itself in the new design.
> >
> >
> > On Tue, Feb 25, 2014 at 2:36 PM, Marlon Pierce <[email protected]> wrote:
> >
> > > Please define the solid and broken line arrows.  Why doesn't the
> > > orchestrator interact with the registry?
> > >
> > >
> > > Marlon
> > >
> > > On 2/25/14 2:29 PM, Saminda Wijeratne wrote:
> > > > The diagrams @[1] will depict functional requirements (at an
> > > > abstract-level) for Airavata from CIPRES and UltraScan gateways.
> > > >
> > > > 1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3
> > > >
> > > >
> > > > On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
> > > > [email protected]> wrote:
> > > >
> > > >> Hi Suresh,
> > > >>
> > > >> Collections are similar to directories and resources are similar to
> > > files.
> > > >> WSO2 Registry implement various different functionalities on top of
> > this
> > > >> abstraction. In one of our projects we use this abstraction to
> > implement
> > > >> persistence storage for text mining workflow. Our text mining workflow
> > > >> starts with a workset which is a collection of books. We represent
> > this
> > > >> workset as a collection in WSO2 Registry under user's collection
> > (Which
> > > can
> > > >> be think of as a workspace specific to user and other users can't
> > access
> > > >> this workspace). This workset can contain one or more resources or
> > > >> collections. Current implementation only support single resource which
> > > is
> > > >> list of book identifiers. When user start a text analysis job on this
> > > >> workset, job manager reads necessary information (currently list of
> > > books)
> > > >> from the workset, download necessary files from a API,  run analysis
> > > >> algorithms on downloaded files and finally saves back the results in a
> > > >> another registry collection. This model is pretty extensible for our
> > use
> > > >> case because if we want some aditional files or data in future we just
> > > need
> > > >> to add another resource or another collection to workset collection.
> > > Then
> > > >> applicaion can decide what to process or what not to process.
> > > >>
> > > >> I think you also need some abstraction like that. I am not sure
> > whether
> > > >> collections and resources abstraction is the best for you. Level of
> > > >> abstraction will depend on your use cases and requirements.
> > > >>
> > > >> Thanks
> > > >> Milinda
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <[email protected]>
> > > wrote:
> > > >>
> > > >>> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
> > > >>> [email protected]> wrote:
> > > >>>
> > > >>>> I also think that moving to Cassandra or any other NoSQL will add
> > > >>>> unneccessary complexity to your solution. Also designing proper
> > (easy
> > > >> to
> > > >>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> > > >> require
> > > >>>> lots of experience and understanding about data structures and
> > > >> queries).
> > > >>>> Also migrating from one NoSQL technology to other can require
> > complete
> > > >>>> re-write. And current relational databases can handle heavy loads
> > > >> except
> > > >>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> > > Airavata
> > > >>>> will see Google and Amazon like loads.
> > > >>>>
> > > >>>> If the constant changes to the data model is the problem , I think
> > > best
> > > >>>> option is to abstract registry implementation to something like
> > > >>> collections
> > > >>>> and resources used in WSO2 Registry [1] or something suitable for
> > > >>> Airavata
> > > >>>> context. That will make it easy to handle changes in data model.
> > > >>> You stated it right Milinda, Airavata does not have scaling needs
> > which
> > > >>> will go beyond RDMS limits, but needs this abstraction.
> > > >>>
> > > >>> Can any one elaborate more on collections and resources used in WSO2
> > > >>> registry?
> > > >>>
> > > >>> Suresh
> > > >>>
> > > >>>> Also don't let the technologies drive design decision. Its always
> > > >> better
> > > >>> to
> > > >>>> let use cases drive the design decision.
> > > >>>>
> > > >>>> Thanks
> > > >>>> Milinda
> > > >>>>
> > > >>>> [1] http://wso2.com/products/governance-registry/
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> > > >> [email protected]
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I'm not trying to discourage you on your exploration to NoSQL
> > > >>> databases. I
> > > >>>>> have the following concern.
> > > >>>>>
> > > >>>>> Your database schema is moderately complex - even for a RDBMS it
> > > seems
> > > >>>>> complex and the data size is relatively small. I'm not sure about
> > the
> > > >>>>> current tools available but I think you will need to write more
> > code
> > > >> to
> > > >>>>> support all your requirements in a NoSQL database. So writing more
> > > >> code
> > > >>> and
> > > >>>>> allow redundancy to support *relatively small* and *structured
> > > >>>>> data*doesn't seem right to me. May be I'm wrong and there are
> > better
> > > >>>>> tools in
> > > >>>>> NoSQL than RDBMS, which I doubt.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Supun..
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <[email protected]>
> > > >>> wrote:
> > > >>>>>> Hi All,
> > > >>>>>>
> > > >>>>>> Airavata is actively migrating to use Thrift API for the RESTless
> > > >>> design
> > > >>>>>> and to facilitate various language bindings from client gateways.
> > > The
> > > >>>>>> programming language support in thrift has been so far very
> > > >>> encouraging.
> > > >>>>>> The current architecture is looking like Figure 1 at [1].
> > > >>>>>>
> > > >>>>>> Language specific clients will be released as thrift SDK's
> > (similar
> > > >> to
> > > >>>>>> evernote sdk's [1]). These clients will be integrated into gateway
> > > >>>>> portals
> > > >>>>>> which connect to the API Server. The API operations brokers he
> > > simple
> > > >>>>> calls
> > > >>>>>> into one or more backend CPI calls (Airavata internal component
> > > >>>>>> interfaces).  An example set of mappings are illustrated in
> > Figure 2
> > > >> at
> > > >>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> > > >> please
> > > >>>>> pay
> > > >>>>>> attention to experiment model at [4].
> > > >>>>>>
> > > >>>>>> For the persistent store, we had few iterations of Airavata
> > Registry
> > > >>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> > > based
> > > >>>>>> registry. To allow the API and the associated data models to
> > evolve,
> > > >> it
> > > >>>>>> will be useful to explore object databases so we can store the
> > > >>> serialized
> > > >>>>>> version of thrift objects directly. But it will be nice to have
> > all
> > > >> (or
> > > >>>>>> most) of the fields queriable. This calls for a more column-family
> > > >>> design
> > > >>>>>> of any NoSQL approaches.
> > > >>>>>>
> > > >>>>>> Any recommendations for a registry architecture?
> > > >>>>>>
> > > >>>>>> Quickly hacking through I find the following approach a viable
> > one:
> > > >>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
> > can
> > > >>>>> benefit
> > > >>>>>> immediately from the replication and reliability of cassandra and
> > > >>>>>> scalability in near future. Some of the model objects like
> > > experiment
> > > >>>>>> creation will need to have strong consistency and most of the
> > > >>> monitoring
> > > >>>>>> can live with eventual consistency.
> > > >>>>>>
> > > >>>>>> Critical comments please?
> > > >>>>>>
> > > >>>>>> Thanks for your time,
> > > >>>>>> Suresh
> > > >>>>>>
> > > >>>>>> [1] -
> > > >>>>>>
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > > >>>>>> [2] - https://dev.evernote.com/doc/
> > > >>>>>> [3] -
> > > >>>>>>
> > > >>
> > >
> > https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > > >>>>>> [4] -
> > > >>>>>>
> > > >>
> > >
> > https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > > >>>>>> [5] - https://github.com/MisterTea/ZombieDB
> > > >>>>>> [6] - https://github.com/Netflix/astyanax
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Supun Kamburugamuva
> > > >>>>> Member, Apache Software Foundation; http://www.apache.org
> > > >>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
> > > >>>>> Blog: http://supunk.blogspot.com
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Milinda Pathirage
> > > >>>> PhD Student Indiana University, Bloomington;
> > > >>>> E-mail: [email protected]
> > > >>>> Web: http://mpathirage.com
> > > >>>> Blog: http://blog.mpathirage.com
> > > >>>
> > > >>
> > > >> --
> > > >> Milinda Pathirage
> > > >> PhD Student Indiana University, Bloomington;
> > > >> E-mail: [email protected]
> > > >> Web: http://mpathirage.com
> > > >> Blog: http://blog.mpathirage.com
> > > >>
> > >
> > >
> >
> 
> 
> 
> -- 
> Best Regards,
> Shameera Rathnayaka.
> 
> email: shameera AT apache.org , shameerainfo AT gmail.com
> Blog : http://shameerarathnayaka.blogspot.com/

Re: Object Database Suggestions for Airavata Registry

Reply via email to