Hi Apoorv,

For all coding suggestions, I always suggest you fork the Airavata sandbox 
repository and submit a pull request from your repo. That way you have a 
provenance of your contributions to a major open source foundation. More over a 
PR is easier to review and provide feedback instead of a repo.

This is great work through. I hope Shamaeera can review and provide feedback, 
he has been the most experienced on this topic and its associated pragmatic 
issues. 

Suresh

> On Jul 17, 2017, at 11:30 AM, Apoorv Palkar <[email protected]> wrote:
> 
> Hey Dev,
> 
> For the past 3-3.5 weeks, I've been investigating the use of Helix in 
> Airavata and been working on the email monitoring problem. I went through the 
> Curator/Zookeeper code to test out the internal workings of Helix. A 
> particular question I had was, what is the difference between external view 
> and current state? I understood that helix uses the resource model to 
> maintain both the ideal state and current state. Why is it necessary to have 
> an external view? In addition to this, what is the purpose of a spectator 
> node. In the documentation, it states that a "spectator" reacts to changes in 
> a distributed system. Why have the particular node have limited abilities 
> when you can give it full access? These questions may be highly important to 
> consider when writing the Helix paper for submission. As for the 
> mailing/monitoring system, I have decided to move forward with the JavaMail 
> API + IMAP implementation. I used the [email protected] (gmail) address as 
> a basis for running my test code. For this particular use case, I didn't use 
> the Gmail API because it had limited capabilities in terms of 
> function/library uses. I played around with the Gmail API, however, I was 
> unsuccessful in getting it to work in a clean and efficient manner. As such, 
> I decided to use the JavaMail api provided via imported libraries. IMAP was 
> considered because it had greater capabilities than POP3. POP3 was 
> inefficient when fetching the emails. In terms of first reading the emails, 
> the first challenge was to set up the code correctly to read from Gmail. 
> Previously the issue was that the emails were being read every time the 
> read() function was called in the Inbox class. This meant that every message 
> would be pulled even if one email was unread. This proved to be highly time 
> costly as the scigap email address has 10000+ emails at any given time. I set 
> up boolean flags for email addresses that were read and ones that were 
> unread. As a result, all messages don't have to be pulled; only the ones with 
> a "false" flag need to be read. These messages were pulled and then put into 
> a Message[] array. This array was then compared using lambda expression as 
> JavaMail retrieves the most current message last. After these messages are 
> put into the array and dealt with, the messages are marked as "read" to avoid 
> reading them again. Currently, I'm working on improving the implementations 
> of all four email parsers. It is highly important to make sure these parsers 
> run effeciently as many emails would be read. I didn't want to use regex as 
> it is slightly slower than string operations. For my demo code, I have 
> currently used string operations to parse the subject title/content. In 
> reality, an array or StringBuilder class shoulder be used when implemented 
> professionally to improve on speed. Currently, I'm refactoring the PBS code 
> to run a bit more optimally and run test cases for the other two email types. 
> Below is a link for the gmail implementation + SLURM interpreter. Basically 
> the idea is to have 4 classes that handle each type and then proceed to parse 
> the messages from the Message[] array. The idea is to then take this COMMON 
> data collected such as job_id, name, status, time and then put it into a 
> thrift data model file. Using this thrift, then create a java thrift object 
> to send over a AMPQ message queue, RabbitMQ, to then potentially be used in a 
> MySQL/SQL database. As of now, the database part is not clear, but it would 
> most likely a registery that needs to be updated via use of Java JPA 
> libary/SQL queries. 
> 
> https://github.com/chessman179/gmailtestinged                  <<<<<<<<<<<<< 
> code.
> 
> 
> ** big shout out to Marcus --

Reply via email to