My point of view here is that the API use cases can be articulated independently of the internal CPI use cases, and are necessary but not sufficient for the overall design.
Marlon On 2/24/14 7:56 AM, Marlon Pierce wrote: > Registry use cases: Eran's point here is important, of course. Can we > collectively articulate those? I suggest we focus on API use cases only > (the capabilities we need to provide to gateways) rather than internal > use cases (how the orchestrator and registry interact). > > > Marlon > > > On 2/24/14 1:40 AM, Eran Chinthaka Withana wrote: >> Hi Suresh, >> >> I will try to keep the focus of this mail thread on to the object db >> selection. But I will also share some comments about the architecture and >> the API since you mentioned those. Please feel free to spawn separate >> threads on those if we want to keep this thread focused on object DB. >> >> Please see the comments in-line. >> >> Thanks, >> Eran Chinthaka Withana >> >> >> On Sun, Feb 23, 2014 at 2:20 PM, Suresh Marru <[email protected]> wrote: >> >>> Hi All, >>> >>> Airavata is actively migrating to use Thrift API for the RESTless design >>> and to facilitate various language bindings from client gateways. The >>> programming language support in thrift has been so far very encouraging. >>> The current architecture is looking like Figure 1 at [1]. >>> >> Quick questions on the architecture. It seems like the API is directly >> contacting the Orchestrator to schedule workflows. I honestly think this is >> not a scalable approach due to the impedance mismatch of these two systems. >> Are we considering to decouple these two with a message queue and go for a >> worker based architecture? >> >> Also, the "API Mapping Diagram" is hinting towards a "kind of" stateful >> service with a sequential set of steps. For example, due to the lack of a >> method to get all experiments, I assume the client is suppose to remember >> the experiment ids and invoke each of these methods in sequence. I'd >> encourage to think in terms of stateless invocation where any client can >> invoke each of these methods without a prior knowledge on the state of the >> execution. >> >> Language specific clients will be released as thrift SDK's (similar to >>> evernote sdk's [1]). These clients will be integrated into gateway portals >>> which connect to the API Server. The API operations brokers he simple calls >>> into one or more backend CPI calls (Airavata internal component >>> interfaces). An example set of mappings are illustrated in Figure 2 at >>> [1]. The current draft of thrift API for version 0.12 is at [3], please pay >>> attention to experiment model at [4]. >>> >> Comments on thrift IDL >> >> 1. The input and output parameters do not have constraint specifiers >> (required vs optional) and left to be default. This will be very >> challenging when we try to improve APIs in later versions and its a >> standard practise to ALWAYS have either optional or required as constraint >> specifiers. >> >> 2. consider using TypeDefs to reduce repetitive names. For example, >> defining airavataErrors.InvalidRequestException as a type will help you to >> simply refer to that as InvalidRequestException >> >> 3. Introduce a parameter for each method to get the API key. This will be >> helpful in the future to identify individual clients, enforce SLAs, logs >> requests, etc >> >> >>> For the persistent store, we had few iterations of Airavata Registry >>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based >>> registry. To allow the API and the associated data models to evolve, it >>> will be useful to explore object databases so we can store the serialized >>> version of thrift objects directly. But it will be nice to have all (or >>> most) of the fields queriable. >> FYI, we did a storage space analysis sometime back and for smaller objects, >> the overhead of storing the object in thrift serialized form vs each >> attribute as a column is same. Also, enabling compression on each column >> family will make the difference go away further. So, I'd first start with a >> fields based object representation. >> >> Having said that, making each attribute part of a column doesn't make it >> queriable. We have to either create secondary indexes or do column slices >> and both these are a bit expensive. So as always with NoSQL storage >> systems, we should always know the queries ahead of time before even >> loosely defining storage schemas. >> >> >>> This calls for a more column-family design of any NoSQL approaches. >>> >>> Any recommendations for a registry architecture? >>> >> It will be easy to answer this question if you can list the use cases for >> the registry. I don't think most people in this list know all the use >> cases. I myself have a very faint memory :) >> >> >>> Quickly hacking through I find the following approach a viable one: >>> ZombieDB[5] over astyanax[6] which talks to Cassandra. >> Not sure why you picked Astyanax (despite it being originated from Netflix >> and boasting to have better performance than Hector due to its token range >> awareness). I'd rather pick Hector or Astyanax based on the performance >> numbers you get. We did some work on this earlier and came up with an >> abstraction over these two clients so that we can switch easily between >> those: https://github.com/WizeCommerce/hecuba >> >> In any case, I think its bit too early to talk about this. >> >> I haven't used ZombieDB before, but before we pick any technology I'd spend >> a bit more time to list down the use cases. >> >> >>> Airavata can benefit immediately from the replication and reliability of >>> cassandra and scalability in near future. Some of the model objects like >>> experiment creation will need to have strong consistency and most of the >>> monitoring can live with eventual consistency. >>> >> Cassandra, even though is supposed to compromise C for AP (from CAP >> theorem), there are knobs (like read and write consistency levels) we can >> use to make it strong C. So I think we are covered here. >>
