Hey Dev,

For the past 3-3.5 weeks, I've been investigating the use of Helix in Airavata 
and been working on the email monitoring problem. I went through the 
Curator/Zookeeper code to test out the internal workings of Helix. A particular 
question I had was, what is the difference between external view and current 
state? I understood that helix uses the resource model to maintain both the 
ideal state and current state. Why is it necessary to have an external view? In 
addition to this, what is the purpose of a spectator node. In the 
documentation, it states that a "spectator" reacts to changes in a distributed 
system. Why have the particular node have limited abilities when you can give 
it full access? These questions may be highly important to consider when 
writing the Helix paper for submission. As for the mailing/monitoring system, I 
have decided to move forward with the JavaMail API + IMAP implementation. I 
used the [email protected] (gmail) address as a basis for running my test 
code. For this particular use case, I didn't use the Gmail API because it had 
limited capabilities in terms of function/library uses. I played around with 
the Gmail API, however, I was unsuccessful in getting it to work in a clean and 
efficient manner. As such, I decided to use the JavaMail api provided via 
imported libraries. IMAP was considered because it had greater capabilities 
than POP3. POP3 was inefficient when fetching the emails. In terms of first 
reading the emails, the first challenge was to set up the code correctly to 
read from Gmail. Previously the issue was that the emails were being read every 
time the read() function was called in the Inbox class. This meant that every 
message would be pulled even if one email was unread. This proved to be highly 
time costly as the scigap email address has 10000+ emails at any given time. I 
set up boolean flags for email addresses that were read and ones that were 
unread. As a result, all messages don't have to be pulled; only the ones with a 
"false" flag need to be read. These messages were pulled and then put into a 
Message[] array. This array was then compared using lambda expression as 
JavaMail retrieves the most current message last. After these messages are put 
into the array and dealt with, the messages are marked as "read" to avoid 
reading them again. Currently, I'm working on improving the implementations of 
all four email parsers. It is highly important to make sure these parsers run 
effeciently as many emails would be read. I didn't want to use regex as it is 
slightly slower than string operations. For my demo code, I have currently used 
string operations to parse the subject title/content. In reality, an array or 
StringBuilder class shoulder be used when implemented professionally to improve 
on speed. Currently, I'm refactoring the PBS code to run a bit more optimally 
and run test cases for the other two email types. Below is a link for the gmail 
implementation + SLURM interpreter. Basically the idea is to have 4 classes 
that handle each type and then proceed to parse the messages from the Message[] 
array. The idea is to then take this COMMON data collected such as job_id, 
name, status, time and then put it into a thrift data model file. Using this 
thrift, then create a java thrift object to send over a AMPQ message queue, 
RabbitMQ, to then potentially be used in a MySQL/SQL database. As of now, the 
database part is not clear, but it would most likely a registery that needs to 
be updated via use of Java JPA libary/SQL queries. 


https://github.com/chessman179/gmailtestinged                  <<<<<<<<<<<<< 
code.





** big shout out to Marcus --

Reply via email to