Monitoring System Diagram

Apoorv Palkar Mon, 24 Jul 2017 10:15:10 -0700

K Dev,




I have attached a architecture diagram for the monitoring system. Currently the 
challenges we are facing is that GFAC is heavily tied to the monitoring system 
via task execution. The ultimate goal to separate this from the current GFAC. I 
understand Marlon doesn't want me looking at the code too much to avoid bias. I 
have glanced at some specifics/couple lines to get an idea of how monitoring is 
currently implemented in Airavata.


Previously, I had been working on the parsing of the particular emails: PBS, 
slurs, UGA, etc. Over the weekend, I ran some performance matrix tests on the 
current parsing code as Shameera suggested. The current code written is quite 
balanced in terms of large scale processing. It is able to quickly parse the 
emails and still maintain a high degree of simplicity. I improved on a couple 
lines without using regex, however the code proved to be highly unmaintainable. 
As shameera/marlon pointed out, these emails change relatively frequently as 
servers/machines are upgraded/replaced. It is important for this code to be 
highly maintainable.


In addition to this, I have been working with Supun to develop a new 
architecture for the mailing list. At first, there was a debate on whether to 
use Zookeeper and/or Redis in a global state. I conducted some research to 
identify the pros and cons of each technology. As Suresh/Gourav suggested, 
airaveata currently uses zookeeper. Also, zookeeper would provide less overhead 
than a database such as Regis. A big problem with this development strategy is 
the complexity of the code we will have to write. In the scenario of multiple 
GFaCs, a global zookeeper makes some sense. However the problem comes if a job 
is cancelled. This can potentially cause edge case scenario problems where say 
GFaC A accidentally processes GFAC B's emails. Therefore, we have to imagine on 
a low level, a clever implementation of locks for who needs to access data and 
who doesn't. This can prove to be a hassle.




Another potential solution we can have is to implement a work queue similar to 
our job submission in Airavata. The work queue delegates the work of 
parsing/reading emails to multiple gfacs. This potentially could avoid 
lock/thread/dangerous situations. If a GFAC fails somehow, there needs to be a 
mechanism in place to handle the particular emails that GFAC is handed. We 
still have to decide on the correct implementation before the code can be 
implemented. I've been also working on the Thrift/RabbitMQ scenario, where data 
is parsed, serialized, and then sent over the network. I will upload the code 
by today/tomorrow.




SHOUT OUT @Marcus !

Monitoring System Diagram

Reply via email to