Hi Hithesh,

This is overall a good design. I have few areas that need further
clarification.

1. Basically this design support a one way communication. Airavata sends
commands and agents execute that. But we have scenarios where agents should
respond to the commands. For example Airavata sends a list files commands
and agent should respond back with the list of the files. And there could
be cases where respond is asynchronous so that airavata does not
immediately get the response. How do you handle such scenarios?
2. When you are implementing queues in the external server, do you keep one
queue per compute resource or do you utilize a single queue for all compute
resources?
3. Can we have multiple external servers for high availability? If so how
do you keep the coordination among multiple external servers?
4. Did you consider other queue implementations like Kafka? If so what is
the advantage you get by using RabbitMQ over that?
5. We might have to write same agents in different languages (python, C,
Java) depending on the support of the compute resource. Please verify that
the client libraries that you use for queue interactions support that.
6. What is the process of registering or removing a compute resources from
the intranet (creating or deleting queues) and who is responsible for that?

Thanks
Dimuthu

On Thu, Feb 8, 2018 at 6:03 PM, Hitesh Kumar Dasika <hdas...@umail.iu.edu>
wrote:

> Dev,
>
> I am looking at a Mechanism which can be used to establish a communicating
> Architecture between a set of *intranet* nodes in a cluster and Airavata.
>
> *Problem Introduction:*
>
> There are some cases wherein a cluster or an HPC system contains nodes or
> machines in the intranet and these cannot be accessed through the HPC
> System's endpoints directly. But, these systems inside the intranet can
> communicate with the external world or Internet. These machines are also
> precious resources that can be used for Job Executions. Hence there needs
> to be a proper architecture in place to make use of those resources. Here
> is a brief architectural discussion on this particular Problem.
>
>
> *Google Doc Link :*
> ‌https://docs.google.com/document/d/11I5mboZmI_D_IocP-
> CfjJiNoD55qtVSLcpWGodAL0z0/edit?usp=sharing
>

Reply via email to