Re: Object Database Suggestions for Airavata Registry

Suresh Marru Mon, 24 Feb 2014 10:16:19 -0800

On Feb 24, 2014, at 12:07 PM, Eran Chinthaka Withana <[email protected]> 
wrote:


> Haha, I don't wanna start a philosophical war here, but calling NoSQL is
> still in its infancy  and  NoSQL data models are hard and require
> lots of experience and understanding about data structures and queries is
> bit surprising to me. Its all about the use cases and finding the correct
> tool to help with it.

We can curb aside the philosophical discussions. When there is a strong 
structure and relational needs with good support for transactions, then 
obviously its better to stick witch sql. And certainly lots of real-world 
production usage scenarios (facebook, twitter, netflix to name a few) have 
proven success on NoSQL for their usecases. So I agree with every one here that 
we have to have the use cases first. I just sent a API driver and limiting to 
it not to confuse the discussion. If we go beyond that, there are other driving 
needs for scalable metadata, like shredding the data files and extracting 
thousands or parameters and making them queriable to allow experimentation.  
Checking the covariance of mathematical models and making decisions on 
executions in real-time, tapping into streams of data and forking of workflows 
on interesting signature and so forth exist too. None of these are really at 
the scale or complexity of so called Big Data.

One other way of looking at cassandra like solutions is the builtin 
reliability. We can argue again on why not mysql master-slave pattern and so 
forth. But some things like cassandra cluster nodes on Amazon and if the data 
center running Airavata goes offline, the services running on EC2 can almost 
pick it from where left off and still has the full identical copy is appealing. 
If there are software which do things for us which we do not need to worry, why 
not? 

But all of these do not motivate the jump Airavata needs to take. The 
motivation is certainly solving the current problems first, which is helping 
Airavata evolve in next few months. Whether the solution is SQL or Not only SQL 
or something hybrid or a nice overlay over mysql is open for discussion. Once 
an API is stable (and can remain so for atleast an year), it very well could be 
argued at that point to have a well-defined schema and be with it in mysql 
world. I certainly do not have a strong opinion either way and have no first 
hand experience of NoSQL as many of you have, but certainly do not want to rule 
it out on popular perceptions. 

Suresh

> 
> Lets try to wait until either Suresh or someone comes up with set of
> usecases for registry to have a valid constructive discussion. May be NoSQL
> is not good and SQL is good for this usecase but we can not get into
> decisions now.
> 
> Thanks,
> Eran Chinthaka Withana
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Mon, Feb 24, 2014 at 8:20 AM, Milinda Pathirage <
> [email protected]> wrote:
> 
>> I also think that moving to Cassandra or any other NoSQL will add
>> unneccessary complexity to your solution. Also designing proper (easy to
>> manage changes, easy to query) NoSQL data models are hard (AFAIK, require
>> lots of experience and understanding about data structures and queries).
>> Also migrating from one NoSQL technology to other can require complete
>> re-write. And current relational databases can handle heavy loads except
>> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
>> will see Google and Amazon like loads.
>> 
>> If the constant changes to the data model is the problem , I think best
>> option is to abstract registry implementation to something like collections
>> and resources used in WSO2 Registry [1] or something suitable for Airavata
>> context. That will make it easy to handle changes in data model.
>> 
>> Also don't let the technologies drive design decision. Its always better to
>> let use cases drive the design decision.
>> 
>> Thanks
>> Milinda
>> 
>> [1] http://wso2.com/products/governance-registry/
>> 
>> 
>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <[email protected]
>>> wrote:
>> 
>>> Hi all,
>>> 
>>> I'm not trying to discourage you on your exploration to NoSQL databases.
>> I
>>> have the following concern.
>>> 
>>> Your database schema is moderately complex - even for a RDBMS it seems
>>> complex and the data size is relatively small. I'm not sure about the
>>> current tools available but I think you will need to write more code to
>>> support all your requirements in a NoSQL database. So writing more code
>> and
>>> allow redundancy to support *relatively small* and *structured
>>> data*doesn't seem right to me. May be I'm wrong and there are better
>>> tools in
>>> NoSQL than RDBMS, which I doubt.
>>> 
>>> Thanks,
>>> Supun..
>>> 
>>> 
>>> 
>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <[email protected]> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Airavata is actively migrating to use Thrift API for the RESTless
>> design
>>>> and to facilitate various language bindings from client gateways. The
>>>> programming language support in thrift has been so far very
>> encouraging.
>>>> The current architecture is looking like Figure 1 at [1].
>>>> 
>>>> Language specific clients will be released as thrift SDK's (similar to
>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>> portals
>>>> which connect to the API Server. The API operations brokers he simple
>>> calls
>>>> into one or more backend CPI calls (Airavata internal component
>>>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>>>> [1]. The current draft of thrift API for version 0.12 is at [3], please
>>> pay
>>>> attention to experiment model at [4].
>>>> 
>>>> For the persistent store, we had few iterations of Airavata Registry
>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>>> registry. To allow the API and the associated data models to evolve, it
>>>> will be useful to explore object databases so we can store the
>> serialized
>>>> version of thrift objects directly. But it will be nice to have all (or
>>>> most) of the fields queriable. This calls for a more column-family
>> design
>>>> of any NoSQL approaches.
>>>> 
>>>> Any recommendations for a registry architecture?
>>>> 
>>>> Quickly hacking through I find the following approach a viable one:
>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
>>> benefit
>>>> immediately from the replication and reliability of cassandra and
>>>> scalability in near future. Some of the model objects like experiment
>>>> creation will need to have strong consistency and most of the
>> monitoring
>>>> can live with eventual consistency.
>>>> 
>>>> Critical comments please?
>>>> 
>>>> Thanks for your time,
>>>> Suresh
>>>> 
>>>> [1] -
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>> [2] - https://dev.evernote.com/doc/
>>>> [3] -
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>> [4] -
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>> [6] - https://github.com/Netflix/astyanax
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Supun Kamburugamuva
>>> Member, Apache Software Foundation; http://www.apache.org
>>> E-mail: [email protected];  Mobile: +1 812 369 6762
>>> Blog: http://supunk.blogspot.com
>>> 
>> 
>> 
>> 
>> --
>> Milinda Pathirage
>> PhD Student Indiana University, Bloomington;
>> E-mail: [email protected]
>> Web: http://mpathirage.com
>> Blog: http://blog.mpathirage.com
>>

Re: Object Database Suggestions for Airavata Registry

Reply via email to