Re: Object Database Suggestions for Airavata Registry

Suresh Marru Thu, 27 Feb 2014 14:35:46 -0800

On Feb 27, 2014, at 1:09 PM, K Yoshimoto <[email protected]> wrote:

> 
> I happened to look through the data model.  
> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> 
> How is information on the input data and transfer method stored?
> Is that freeform text in DataTransferDetails?


Hi Kenneth,

Current API draft is only facilitating the simple input types and any files 
will have to be passed. This will work with an assumption that the portal which 
connects to Airavata has a file staging component. Do you think it will be 
better if Airavata API provides ability to upload the input data files as 
opposed to provide a URL? 

> 
> Also, is there a place to describe preprocessin of job input data
> or a custom submit command?

I think this is a good requirement to allow pro-processing steps of the job 
itself. I think we should consider this for 0.13 release. 

Thanks for taking time to review the API.
Suresh

> 
> Sorry for the sidetrack.
> 
> Kenneth
> 
> On Wed, Feb 26, 2014 at 02:13:46AM +0530, Shameera Rathnayaka wrote:
>> Hi all,
>> 
>> Just thinking a loud here, sorry if i am moving this thread to another
>> direction.
>> 
>> If we going to use our own registry implementation, do we have consider
>> provide database layer where we can plug different kind of databases?(may
>> be Supun also suggesting the same in his previous reply). As we are already
>> separating SPIs and APIs for other components, we can do the same for DB
>> implementation too. NoSql  database like cassandra also have cql driver
>> which is identical to Mysql driver. So it is not difficult to implement
>> plugable environment,
>> 
>> In wso2 registry they already have above capability but not yet implemented
>> CQL as i know.
>> 
>> Thanks,
>> Shameera.
>> 
>> 
>> On Wed, Feb 26, 2014 at 1:36 AM, Saminda Wijeratne <[email protected]>wrote:
>> 
>>> Sorry I missed the arrow from Registry to Orchestrator. Thanks for pointing
>>> it out Marlon. Updated the arrows and added a legend.
>>> 
>>> Broken line arrow is involved in MessageBox component where it gets
>>> triggered from time to time without external user intervention. Also
>>> there's still some technical details we need to figure-out on how the
>>> MessageBox will function and expose itself in the new design.
>>> 
>>> 
>>> On Tue, Feb 25, 2014 at 2:36 PM, Marlon Pierce <[email protected]> wrote:
>>> 
>>>> Please define the solid and broken line arrows.  Why doesn't the
>>>> orchestrator interact with the registry?
>>>> 
>>>> 
>>>> Marlon
>>>> 
>>>> On 2/25/14 2:29 PM, Saminda Wijeratne wrote:
>>>>> The diagrams @[1] will depict functional requirements (at an
>>>>> abstract-level) for Airavata from CIPRES and UltraScan gateways.
>>>>> 
>>>>> 1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3
>>>>> 
>>>>> 
>>>>> On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Hi Suresh,
>>>>>> 
>>>>>> Collections are similar to directories and resources are similar to
>>>> files.
>>>>>> WSO2 Registry implement various different functionalities on top of
>>> this
>>>>>> abstraction. In one of our projects we use this abstraction to
>>> implement
>>>>>> persistence storage for text mining workflow. Our text mining workflow
>>>>>> starts with a workset which is a collection of books. We represent
>>> this
>>>>>> workset as a collection in WSO2 Registry under user's collection
>>> (Which
>>>> can
>>>>>> be think of as a workspace specific to user and other users can't
>>> access
>>>>>> this workspace). This workset can contain one or more resources or
>>>>>> collections. Current implementation only support single resource which
>>>> is
>>>>>> list of book identifiers. When user start a text analysis job on this
>>>>>> workset, job manager reads necessary information (currently list of
>>>> books)
>>>>>> from the workset, download necessary files from a API,  run analysis
>>>>>> algorithms on downloaded files and finally saves back the results in a
>>>>>> another registry collection. This model is pretty extensible for our
>>> use
>>>>>> case because if we want some aditional files or data in future we just
>>>> need
>>>>>> to add another resource or another collection to workset collection.
>>>> Then
>>>>>> applicaion can decide what to process or what not to process.
>>>>>> 
>>>>>> I think you also need some abstraction like that. I am not sure
>>> whether
>>>>>> collections and resources abstraction is the best for you. Level of
>>>>>> abstraction will depend on your use cases and requirements.
>>>>>> 
>>>>>> Thanks
>>>>>> Milinda
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> I also think that moving to Cassandra or any other NoSQL will add
>>>>>>>> unneccessary complexity to your solution. Also designing proper
>>> (easy
>>>>>> to
>>>>>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
>>>>>> require
>>>>>>>> lots of experience and understanding about data structures and
>>>>>> queries).
>>>>>>>> Also migrating from one NoSQL technology to other can require
>>> complete
>>>>>>>> re-write. And current relational databases can handle heavy loads
>>>>>> except
>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>> Airavata
>>>>>>>> will see Google and Amazon like loads.
>>>>>>>> 
>>>>>>>> If the constant changes to the data model is the problem , I think
>>>> best
>>>>>>>> option is to abstract registry implementation to something like
>>>>>>> collections
>>>>>>>> and resources used in WSO2 Registry [1] or something suitable for
>>>>>>> Airavata
>>>>>>>> context. That will make it easy to handle changes in data model.
>>>>>>> You stated it right Milinda, Airavata does not have scaling needs
>>> which
>>>>>>> will go beyond RDMS limits, but needs this abstraction.
>>>>>>> 
>>>>>>> Can any one elaborate more on collections and resources used in WSO2
>>>>>>> registry?
>>>>>>> 
>>>>>>> Suresh
>>>>>>> 
>>>>>>>> Also don't let the technologies drive design decision. Its always
>>>>>> better
>>>>>>> to
>>>>>>>> let use cases drive the design decision.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Milinda
>>>>>>>> 
>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>> [email protected]
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>> databases. I
>>>>>>>>> have the following concern.
>>>>>>>>> 
>>>>>>>>> Your database schema is moderately complex - even for a RDBMS it
>>>> seems
>>>>>>>>> complex and the data size is relatively small. I'm not sure about
>>> the
>>>>>>>>> current tools available but I think you will need to write more
>>> code
>>>>>> to
>>>>>>>>> support all your requirements in a NoSQL database. So writing more
>>>>>> code
>>>>>>> and
>>>>>>>>> allow redundancy to support *relatively small* and *structured
>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>> better
>>>>>>>>> tools in
>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Supun..
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <[email protected]>
>>>>>>> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>> Airavata is actively migrating to use Thrift API for the RESTless
>>>>>>> design
>>>>>>>>>> and to facilitate various language bindings from client gateways.
>>>> The
>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>> encouraging.
>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>> 
>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>> (similar
>>>>>> to
>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>>>>>>>> portals
>>>>>>>>>> which connect to the API Server. The API operations brokers he
>>>> simple
>>>>>>>>> calls
>>>>>>>>>> into one or more backend CPI calls (Airavata internal component
>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>> Figure 2
>>>>>> at
>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
>>>>>> please
>>>>>>>>> pay
>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>> 
>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>> Registry
>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
>>>> based
>>>>>>>>>> registry. To allow the API and the associated data models to
>>> evolve,
>>>>>> it
>>>>>>>>>> will be useful to explore object databases so we can store the
>>>>>>> serialized
>>>>>>>>>> version of thrift objects directly. But it will be nice to have
>>> all
>>>>>> (or
>>>>>>>>>> most) of the fields queriable. This calls for a more column-family
>>>>>>> design
>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>> 
>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>> 
>>>>>>>>>> Quickly hacking through I find the following approach a viable
>>> one:
>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
>>> can
>>>>>>>>> benefit
>>>>>>>>>> immediately from the replication and reliability of cassandra and
>>>>>>>>>> scalability in near future. Some of the model objects like
>>>> experiment
>>>>>>>>>> creation will need to have strong consistency and most of the
>>>>>>> monitoring
>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>> 
>>>>>>>>>> Critical comments please?
>>>>>>>>>> 
>>>>>>>>>> Thanks for your time,
>>>>>>>>>> Suresh
>>>>>>>>>> 
>>>>>>>>>> [1] -
>>>>>>>>>> 
>>>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>> [3] -
>>>>>>>>>> 
>>>>>> 
>>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>> [4] -
>>>>>>>>>> 
>>>>>> 
>>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Supun Kamburugamuva
>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Milinda Pathirage
>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>> E-mail: [email protected]
>>>>>>>> Web: http://mpathirage.com
>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Milinda Pathirage
>>>>>> PhD Student Indiana University, Bloomington;
>>>>>> E-mail: [email protected]
>>>>>> Web: http://mpathirage.com
>>>>>> Blog: http://blog.mpathirage.com
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Shameera Rathnayaka.
>> 
>> email: shameera AT apache.org , shameerainfo AT gmail.com
>> Blog : http://shameerarathnayaka.blogspot.com/

Re: Object Database Suggestions for Airavata Registry

Reply via email to