Hi Hithesh, This is overall a good design. I have few areas that need further clarification.
1. Basically this design support a one way communication. Airavata sends commands and agents execute that. But we have scenarios where agents should respond to the commands. For example Airavata sends a list files commands and agent should respond back with the list of the files. And there could be cases where respond is asynchronous so that airavata does not immediately get the response. How do you handle such scenarios? 2. When you are implementing queues in the external server, do you keep one queue per compute resource or do you utilize a single queue for all compute resources? 3. Can we have multiple external servers for high availability? If so how do you keep the coordination among multiple external servers? 4. Did you consider other queue implementations like Kafka? If so what is the advantage you get by using RabbitMQ over that? 5. We might have to write same agents in different languages (python, C, Java) depending on the support of the compute resource. Please verify that the client libraries that you use for queue interactions support that. 6. What is the process of registering or removing a compute resources from the intranet (creating or deleting queues) and who is responsible for that? Thanks Dimuthu On Thu, Feb 8, 2018 at 6:03 PM, Hitesh Kumar Dasika <hdas...@umail.iu.edu> wrote: > Dev, > > I am looking at a Mechanism which can be used to establish a communicating > Architecture between a set of *intranet* nodes in a cluster and Airavata. > > *Problem Introduction:* > > There are some cases wherein a cluster or an HPC system contains nodes or > machines in the intranet and these cannot be accessed through the HPC > System's endpoints directly. But, these systems inside the intranet can > communicate with the external world or Internet. These machines are also > precious resources that can be used for Job Executions. Hence there needs > to be a proper architecture in place to make use of those resources. Here > is a brief architectural discussion on this particular Problem. > > > *Google Doc Link :* > https://docs.google.com/document/d/11I5mboZmI_D_IocP- > CfjJiNoD55qtVSLcpWGodAL0z0/edit?usp=sharing >