Re: Object Database Suggestions for Airavata Registry

Marlon Pierce Mon, 24 Feb 2014 04:57:07 -0800

Registry use cases: Eran's point here is important, of course.  Can we
collectively articulate those?  I suggest we focus on API use cases only
(the capabilities we need to provide to gateways) rather than internal
use cases (how the orchestrator and registry interact).



Marlon


On 2/24/14 1:40 AM, Eran Chinthaka Withana wrote:
> Hi Suresh,
>
> I will try to keep the focus of this mail thread on to the object db
> selection. But I will also share some comments about the architecture and
> the API since you mentioned those. Please feel free to spawn separate
> threads on those if we want to keep this thread focused on object DB.
>
> Please see the comments in-line.
>
> Thanks,
> Eran Chinthaka Withana
>
>
> On Sun, Feb 23, 2014 at 2:20 PM, Suresh Marru <[email protected]> wrote:
>
>> Hi All,
>>
>> Airavata is actively migrating to use Thrift API for the RESTless design
>> and to facilitate various language bindings from client gateways. The
>> programming language support in thrift has been so far very encouraging.
>> The current architecture is looking like Figure 1 at [1].
>>
> Quick questions on the architecture. It seems like the API is directly
> contacting the Orchestrator to schedule workflows. I honestly think this is
> not a scalable approach due to the impedance mismatch of these two systems.
> Are we considering to decouple these two with a message queue and go for a
> worker based architecture?
>
>  Also, the "API Mapping Diagram" is hinting towards a "kind of" stateful
> service with a sequential set of steps. For example, due to the lack of a
> method to get all experiments, I assume the client is suppose to remember
> the experiment ids and invoke each of these methods in sequence. I'd
> encourage to think in terms of stateless invocation where any client can
> invoke each of these methods without a prior knowledge on the state of the
> execution.
>
> Language specific clients will be released as thrift SDK's (similar to
>> evernote sdk's [1]). These clients will be integrated into gateway portals
>> which connect to the API Server. The API operations brokers he simple calls
>> into one or more backend CPI calls (Airavata internal component
>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
>> attention to experiment model at [4].
>>
> Comments on thrift IDL
>
> 1. The input and output parameters do not have constraint specifiers
> (required vs optional) and left to be default. This will be very
> challenging when we try to improve APIs in later versions and its a
> standard practise to ALWAYS have either optional or required as constraint
> specifiers.
>
> 2. consider using TypeDefs to reduce repetitive names. For example,
> defining airavataErrors.InvalidRequestException as a type will help you to
> simply refer to that as InvalidRequestException
>
> 3. Introduce a parameter for each method to get the API key. This will be
> helpful in the future to identify individual clients, enforce SLAs, logs
> requests, etc
>
>
>> For the persistent store, we had few iterations of Airavata Registry
>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>> registry. To allow the API and the associated data models to evolve, it
>> will be useful to explore object databases so we can store the serialized
>> version of thrift objects directly. But it will be nice to have all (or
>> most) of the fields queriable.
>
> FYI, we did a storage space analysis sometime back and for smaller objects,
> the overhead of storing the object in thrift serialized form vs each
> attribute as a column is same. Also, enabling compression on each column
> family will make the difference go away further. So, I'd first start with a
> fields based object representation.
>
> Having said that, making each attribute part of a column doesn't make it
> queriable. We have to either create secondary indexes or do column slices
> and both these are a bit expensive. So as always with NoSQL storage
> systems, we should always know the queries ahead of time before even
> loosely defining storage schemas.
>
>
>> This calls for a more column-family design of any NoSQL approaches.
>>
>> Any recommendations for a registry architecture?
>>
> It will be easy to answer this question if you can list the use cases for
> the registry. I don't think most people in this list know all the use
> cases. I myself have a very faint memory :)
>
>
>> Quickly hacking through I find the following approach a viable one:
>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>
> Not sure why you picked Astyanax (despite it being originated from Netflix
> and boasting to have better performance than Hector due to its token range
> awareness). I'd rather pick Hector or Astyanax based on the performance
> numbers you get. We did some work on this earlier and came up with an
> abstraction over these two clients so that we can switch easily between
> those: https://github.com/WizeCommerce/hecuba
>
> In any case, I think its bit too early to talk about this.
>
> I haven't used ZombieDB before, but before we pick any technology I'd spend
> a bit more time to list down the use cases.
>
>
>> Airavata can benefit immediately from the replication and reliability of
>> cassandra and scalability in near future. Some of the model objects like
>> experiment creation will need to have strong consistency and most of the
>> monitoring can live with eventual consistency.
>>
> Cassandra, even though is supposed to compromise C for AP (from CAP
> theorem), there are knobs (like read and write consistency levels) we can
> use to make it strong C. So I think we are covered here.
>

Re: Object Database Suggestions for Airavata Registry

Reply via email to