Registry use cases: Eran's point here is important, of course. Can we collectively articulate those? I suggest we focus on API use cases only (the capabilities we need to provide to gateways) rather than internal use cases (how the orchestrator and registry interact).
Marlon On 2/24/14 1:40 AM, Eran Chinthaka Withana wrote: > Hi Suresh, > > I will try to keep the focus of this mail thread on to the object db > selection. But I will also share some comments about the architecture and > the API since you mentioned those. Please feel free to spawn separate > threads on those if we want to keep this thread focused on object DB. > > Please see the comments in-line. > > Thanks, > Eran Chinthaka Withana > > > On Sun, Feb 23, 2014 at 2:20 PM, Suresh Marru <[email protected]> wrote: > >> Hi All, >> >> Airavata is actively migrating to use Thrift API for the RESTless design >> and to facilitate various language bindings from client gateways. The >> programming language support in thrift has been so far very encouraging. >> The current architecture is looking like Figure 1 at [1]. >> > Quick questions on the architecture. It seems like the API is directly > contacting the Orchestrator to schedule workflows. I honestly think this is > not a scalable approach due to the impedance mismatch of these two systems. > Are we considering to decouple these two with a message queue and go for a > worker based architecture? > > Also, the "API Mapping Diagram" is hinting towards a "kind of" stateful > service with a sequential set of steps. For example, due to the lack of a > method to get all experiments, I assume the client is suppose to remember > the experiment ids and invoke each of these methods in sequence. I'd > encourage to think in terms of stateless invocation where any client can > invoke each of these methods without a prior knowledge on the state of the > execution. > > Language specific clients will be released as thrift SDK's (similar to >> evernote sdk's [1]). These clients will be integrated into gateway portals >> which connect to the API Server. The API operations brokers he simple calls >> into one or more backend CPI calls (Airavata internal component >> interfaces). An example set of mappings are illustrated in Figure 2 at >> [1]. The current draft of thrift API for version 0.12 is at [3], please pay >> attention to experiment model at [4]. >> > Comments on thrift IDL > > 1. The input and output parameters do not have constraint specifiers > (required vs optional) and left to be default. This will be very > challenging when we try to improve APIs in later versions and its a > standard practise to ALWAYS have either optional or required as constraint > specifiers. > > 2. consider using TypeDefs to reduce repetitive names. For example, > defining airavataErrors.InvalidRequestException as a type will help you to > simply refer to that as InvalidRequestException > > 3. Introduce a parameter for each method to get the API key. This will be > helpful in the future to identify individual clients, enforce SLAs, logs > requests, etc > > >> For the persistent store, we had few iterations of Airavata Registry >> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based >> registry. To allow the API and the associated data models to evolve, it >> will be useful to explore object databases so we can store the serialized >> version of thrift objects directly. But it will be nice to have all (or >> most) of the fields queriable. > > FYI, we did a storage space analysis sometime back and for smaller objects, > the overhead of storing the object in thrift serialized form vs each > attribute as a column is same. Also, enabling compression on each column > family will make the difference go away further. So, I'd first start with a > fields based object representation. > > Having said that, making each attribute part of a column doesn't make it > queriable. We have to either create secondary indexes or do column slices > and both these are a bit expensive. So as always with NoSQL storage > systems, we should always know the queries ahead of time before even > loosely defining storage schemas. > > >> This calls for a more column-family design of any NoSQL approaches. >> >> Any recommendations for a registry architecture? >> > It will be easy to answer this question if you can list the use cases for > the registry. I don't think most people in this list know all the use > cases. I myself have a very faint memory :) > > >> Quickly hacking through I find the following approach a viable one: >> ZombieDB[5] over astyanax[6] which talks to Cassandra. > > Not sure why you picked Astyanax (despite it being originated from Netflix > and boasting to have better performance than Hector due to its token range > awareness). I'd rather pick Hector or Astyanax based on the performance > numbers you get. We did some work on this earlier and came up with an > abstraction over these two clients so that we can switch easily between > those: https://github.com/WizeCommerce/hecuba > > In any case, I think its bit too early to talk about this. > > I haven't used ZombieDB before, but before we pick any technology I'd spend > a bit more time to list down the use cases. > > >> Airavata can benefit immediately from the replication and reliability of >> cassandra and scalability in near future. Some of the model objects like >> experiment creation will need to have strong consistency and most of the >> monitoring can live with eventual consistency. >> > Cassandra, even though is supposed to compromise C for AP (from CAP > theorem), there are knobs (like read and write consistency levels) we can > use to make it strong C. So I think we are covered here. >
