Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Shenoy, Gourav Ganesh Wed, 01 Nov 2017 23:11:42 -0700

Hi Dimuthu,

First of all, I must say this is really impressive – the way you grasped the 
problem and built a working prototype in such a short span – this is 
phenomenal. I haven’t yet installed the prototype, but I did look through the 
design document.


Here are some thoughts/questions. Please feel free to comment or correct 
anything I’ve misunderstood.


  *   You are assuming a “message oriented” approach to performing distributed 
task execution. Rather than using a third-party framework (e.g.: Helix).
  *   Granted this gives more control in managing workloads, but this also adds 
the overhead of:
     *   (1) creating, defining and persisting DAGs,
     *   (2) orchestrating through the DAG (although you aim to leverage 
message broker),
     *   (3) managing the state of these DAGs as they progress,
     *   (4) handling errors related to workflow execution, and related to 
broker dependency.
  *   I did not understand how you are defining the DAGs. I remember when we 
were trying to solve this problem in class using the MQ approach (similar but 
using RabbitMQ instead of Kafka), one of the challenges was the allow a way to 
dynamically create DAGs and persist them if necessary. How are you defining 
DAGs in your prototype?
  *   There might be situations when a workflow needs to be re-run (say 
something went wrong initially, or needed corrections, or target resource was 
down, etc). Does the design accommodate these scenarios?
  *   One of the bigger goals is to be able to orchestrate Airavata components, 
and not just the tasks involved in an experiment. As I understand, the design 
relies on messaging to orchestrate, but is messaging going to be the only 
communication paradigm within Airavata microservices? If there are 2 components 
communicating via Thrift, how does the architecture handle them?
  *   Another point which comes to my mind is about moving away from this big 
“API Server” block and segregating into smaller service level first class SDKs. 
For e.g.: We now have a first class “Profile Service” which allows isolated 
interactions pertaining to Users, Tenants and Groups. We might want to keep 
these SDKs / services as independent as possible, which also means no reliance 
on messaging. Will the architecture support these SDKs?
  *   As Marlon pointed out and something we looked at last Spring, about 
“database-per-microservice”. Currently we have a Registry which is shared among 
different components like Orchestrator and GFac. Ideally each microservice will 
own its database, and for any intersections in data between microservices, we 
would sync up using events (messages). I can see the Kafka broker come in handy 
for this.
  *   As far as possible, we would like to adopt a generic method to 
define/maintain/execute the 3 types of workflows (or maybe more):
     *   External – user defined multi-application experiment; which would mean 
a parent experiment constituting child experiments.
     *   Internal – component level workflows between Airavata microservices. 
One such use-case I can think about is about dynamic resource binding. Eg: 
provisioning a container (as target resource) if it does not exist, or spinning 
up a VM and deploying an application at runtime.
     *   Experiment – typical experiment level task execution workflows needed 
to complete the experiment.

I apologize for this lengthy email, and I really appreciate the work you did. 
Some of these points might not make sense, so I would encourage discussions 
from the mailing list. Keep up the good work!

Thanks and Regards,
Gourav Shenoy

From: "Pierce, Marlon" <[email protected]>
Reply-To: <[email protected]>
Date: Wednesday, November 1, 2017 at 5:22 PM
To: "[email protected]" <[email protected]>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 2 
- Initial Prototype

Hi Dimuthu,

Thanks for sending this very thoughtful document. A couple of comments:

* Use of Kafka instead of RabbitMQ is interesting. Can you say more about how 
this approach can handle Kafka client failures?  For RabbitMQ, for example, 
there is the simple “Work Queue” approach in which the broker pushes a task to 
a worker. The task remains in queue until the worker sends an acknowledgement 
that the job has been handled, not just received. “Handled” may mean for 
example that the job has been submitted to an external batch scheduler over 
SSH, which may require some retries, etc.   If the worker crashes before the 
job has been submitted, then the broker can resend the message to another 
worker.   I’m wondering how your Kafka-based solution would handle the same 
issue.

* A simpler but more common failure is communicating with external resources. A 
task executor may need to SSH to a remote resource, which can fail (the 
resource is slow to communicate, usually). How do you handle this case?

* Your design focuses on Airavata’s experiment execution handling. Airavata’s 
registry is another important component: this is where experiment objects get 
persistently stored. The registry stores metadata about both “live” experiments 
that are currently executing as well as archived experiments that have 
completed.

How would you extend your architecture to include the registry?

Marlon


From: "[email protected]" <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, October 30, 2017 at 10:45 AM
To: "[email protected]" <[email protected]>
Subject: Linked Container Services for Apache Airavata Components - Phase 2 - 
Initial Prototype

Hi All,

Based on the analysis of Phase 1, within past two weeks I have been working on 
implementing a task execution workflow following the microservices deployment 
pattern and Kubernetes as the deployment platform.

Please find attached design document that explains the components and messaging 
interactions between components. Based on that design, I have implemented 
following components

1. Set of microservices to compose the workflow
2. A simple Web Console to  deploy and monitor workflows on the framework

I used Kakfa as the primary messaging medium to communicate among the 
microservices due to its simplicity and powerful features like partitions and 
consumer groups.

I have attached a user guide so that you can install and try this in your local 
machine. And source code for each component can be found from [1]

Please share you ideas and suggestions.

Thanks
Dimuthu

[1] 
https://github.com/DImuthuUpe/airavata/tree/master/sandbox/airavata-kubernetes
[2] 
https://docs.google.com/document/d/1R1xrmuPldHiWVDn4xNVay9Vnxn9FODQZXtF55JxJpSY/edit?usp=sharing
[3] 
https://docs.google.com/document/d/1A5eRIZiuUj4ShZVMS0NdAxjAxtOTZXculaYDCZ7IMQ8/edit?usp=sharing

Re: Linked Container Services for Apache Airavata Components - Phase 2 - Initial Prototype

Reply via email to