I could respond to each thread in detail, but I see the general sense is inquiring on the use case, so let me try and explain this and see if it comes across. I am fully onboard with perceptions of relational vs nosql and also agree current Airavata needs are not a direct map for NoSQL migration. I will summarize the driving motivation:
Background: The key problem Airavata needs to solve is getting the API and associated data model right. The problem is current relational database (with OpenJPA overlay) is severely limiting the API evolution. Science Gateways by nature are very science domain and use-case specific. But Airavata is tackling this challenging problem of providing a generic API which will meet and enable these use case centric integration. The issue here is, we are designing an API to handle a wide range of known (and some foreseen) use cases. But at the same time trying to keep it simple and yet flexible. The only way we can get through a reasonable, normalized version of API is by hands-on programming against the API. Within the Airavata PMC itself, we can solicit a half-a-dozen different ways on how to visualize the data model. And we need few hackethon’s with real-end users of Airavata until we find a common ground. All of this needs rapid prototyping. Currently a slight change in the data model is taking close to two weeks of re-arcitecting the Open-JPA based registry. There are many known problems with current draft of data model which have to be put-down in the interest of making over all system progress. So the driving motivation is not certainly any of the classic NoSQL needs. But a simple one, can we have registry which is schema-agnostic and yet is queriable for most of the fields in the model? Can we try 10 different variants of data model (hence API) within the next 3 months with focused hackethon’s and arrive at a stable 1.0 version of API? Part one is the discussion is successful that it raised every one’s eye brows. Now that we have every one’s attention, what will be a good data store for Airavata which will meet these needs? P.S: Additional background: The API has been in development for close to 3 years and is falling short of pleasing a majority. Many academic standardization efforts fail terribly trying to pretend to understand all use cases and proposing a standard way (which ends up unnecessarily complex and not usable). Science by nature is evolutionary, and restricting the capabilities by a known set of use cases prevents the use of middleware for real-scientific research (and gets limited to proof of concept demonstrations, papers, educational use). The only way meeting the challenges of these evolving needs is to have the framework which can evolve with minimal disruption. Great thoughts so far, please keep ’em coming until we can find a solution not by the technical fancies but to address the real need. Cheers, Suresh On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <[email protected]> wrote: > On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage < > [email protected]> wrote: > >> I also think that moving to Cassandra or any other NoSQL will add >> unneccessary complexity to your solution. Also designing proper (easy to >> manage changes, easy to query) NoSQL data models are hard (AFAIK, require >> lots of experience and understanding about data structures and queries). >> Also migrating from one NoSQL technology to other can require complete >> re-write. And current relational databases can handle heavy loads except >> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata >> will see Google and Amazon like loads. >> > +1 > >> >> If the constant changes to the data model is the problem , I think best >> option is to abstract registry implementation to something like collections >> and resources used in WSO2 Registry [1] or something suitable for Airavata >> context. That will make it easy to handle changes in data model. >> >> Also don't let the technologies drive design decision. Its always better to >> let use cases drive the design decision. >> > +1 > > Regards > Lahiru > >> >> Thanks >> Milinda >> >> [1] http://wso2.com/products/governance-registry/ >> >> >> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <[email protected] >>> wrote: >> >>> Hi all, >>> >>> I'm not trying to discourage you on your exploration to NoSQL databases. >> I >>> have the following concern. >>> >>> Your database schema is moderately complex - even for a RDBMS it seems >>> complex and the data size is relatively small. I'm not sure about the >>> current tools available but I think you will need to write more code to >>> support all your requirements in a NoSQL database. So writing more code >> and >>> allow redundancy to support *relatively small* and *structured >>> data*doesn't seem right to me. May be I'm wrong and there are better >>> tools in >>> NoSQL than RDBMS, which I doubt. >>> >>> Thanks, >>> Supun.. >>> >>> >>> >>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <[email protected]> wrote: >>> >>>> Hi All, >>>> >>>> Airavata is actively migrating to use Thrift API for the RESTless >> design >>>> and to facilitate various language bindings from client gateways. The >>>> programming language support in thrift has been so far very >> encouraging. >>>> The current architecture is looking like Figure 1 at [1]. >>>> >>>> Language specific clients will be released as thrift SDK's (similar to >>>> evernote sdk's [1]). These clients will be integrated into gateway >>> portals >>>> which connect to the API Server. The API operations brokers he simple >>> calls >>>> into one or more backend CPI calls (Airavata internal component >>>> interfaces). An example set of mappings are illustrated in Figure 2 at >>>> [1]. The current draft of thrift API for version 0.12 is at [3], please >>> pay >>>> attention to experiment model at [4]. >>>> >>>> For the persistent store, we had few iterations of Airavata Registry >>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based >>>> registry. To allow the API and the associated data models to evolve, it >>>> will be useful to explore object databases so we can store the >> serialized >>>> version of thrift objects directly. But it will be nice to have all (or >>>> most) of the fields queriable. This calls for a more column-family >> design >>>> of any NoSQL approaches. >>>> >>>> Any recommendations for a registry architecture? >>>> >>>> Quickly hacking through I find the following approach a viable one: >>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can >>> benefit >>>> immediately from the replication and reliability of cassandra and >>>> scalability in near future. Some of the model objects like experiment >>>> creation will need to have strong consistency and most of the >> monitoring >>>> can live with eventual consistency. >>>> >>>> Critical comments please? >>>> >>>> Thanks for your time, >>>> Suresh >>>> >>>> [1] - >>>> >>> >> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams >>>> [2] - https://dev.evernote.com/doc/ >>>> [3] - >>>> >>> >> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD >>>> [4] - >>>> >>> >> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD >>>> [5] - https://github.com/MisterTea/ZombieDB >>>> [6] - https://github.com/Netflix/astyanax >>>> >>>> >>> >>> >>> -- >>> Supun Kamburugamuva >>> Member, Apache Software Foundation; http://www.apache.org >>> E-mail: [email protected]; Mobile: +1 812 369 6762 >>> Blog: http://supunk.blogspot.com >>> >> >> >> >> -- >> Milinda Pathirage >> PhD Student Indiana University, Bloomington; >> E-mail: [email protected] >> Web: http://mpathirage.com >> Blog: http://blog.mpathirage.com >> > > > > -- > System Analyst Programmer > PTI Lab > Indiana University
