| Hi, Dileepa.
Just some questions for helping in validating the model.
Still not clear to me why the Reputation entity has a relationship with EmailContact also, and not only to an Email. The EmailContact relationship could always be derived from the Emails sender (EmailContact) so, unless you're explicitly modeling that derived relationship, it shouldn't appear.
HTH,
Oscar
El 25/03/2014, a las 21:20, Dileepa Jayakody < [email protected]> escribió: Hi Dan and all,
Here is the basic class diagram for the domain entitiies in RB : http://yuml.me/825d7db5
Please note that I have used the name EmailContact instead of EmailSenderProfile for clarity purpose. Effectively this entity represents the email contacts in the user's inbox.
Each email and email contact will have a corresponding Reputation entity. And in the view models, EmailReputationViewModel will display emails with their reputation data and ContactReputationViewModel will display email contacts with their reputation data in the RB web application.
Your ideas and suggestions are most welcome.
Thanks, Dileepa
On Tue, Mar 25, 2014 at 3:42 PM, Dileepa Jayakody <[email protected]
wrote:
Hi Dan,
Thanks a lot for your insight. Please see my comments inline below.
On Tue, Mar 25, 2014 at 1:21 PM, Dan Haywood <[email protected]
wrote:
Hi Dileepa,
I've just posted the comments below on your GSOC proposal. I know that you can't make further changes to the proposal, so I'm posting them here on the dev list, so we can keep the conversation going.
So..
* good to see you intend to set up a project on github for this; please do this asap. That way you can start to capture docs/working notes. I also suggest that you set up github pages for your site [1].
* What I'd like to see right now is some sort of UML diagram; you could sketch one using yuml.me [2] and add it to your github site. I can't quite work out how the persistent domain entities relate to each other. In particular, are EmailSenderProfile and Reputation in 1-1 correspondence?
I will draw a ER diagram for the domain entities and we can enhance it over discussions. Yes I pictured EmailSenderProfile as the representation of an email sender (a contact) and each email sender will have a corresponding reputation score (accumulated and normalized reputation-score over the emails sent by him) represented by the Reputation domain entity.
* In your timeline I noticed you said "Commit all code to github", only on Aug 11. It's much better practice (and will help mentors guide you) if you commit changes as you go. That way it's also safely backed up, and you can go back in time if you mess up.
Yes I agree, in fact I didn't mean I'm going to commit all code at once only on Aug 11. I meant to say I'm planning to finish development and commit everything by Aug 11. I strongly agree on getting feedback along the way of development, after all I'm looking at using agile development for my project :). Sorry for having interpreted my idea in a misleading way on the proposal.
* You might also want to version control the academic paper, too, if your university lets you.
Some further points relating to the design:
* You have Email as a persistent entity. I'm a bit worried what that might mean about storage and also synchronization. Is it necessary to have the Email persisted in Isis? If not persisted, then should the Email entity be a view model, or as a fake persistent entity utilizing a new StoreManager impl in JDO. See the recent thread [3] on this topic.
Email entity will have several attributes such as : id, sender-id,
reputation-score. sender-id will be mapped to the EmailSenderProfile and reputation-score will be a score given by the ML process evaluating the reputation of the email. Could email-entity be a view model in this scenario? If so what is the advantage of defining it as a view-model?
I think we can discuss more on this with a ER diagram for the application. I will come up with a ER diagram asap.
* Conversely, does Mahout require some sort of persistent dataset of emails in order to do the reputation scoring? Or does it just hold aggregated information? If the former, I worry that we now have each email stored in potentially 3 places: gmail, Isis and Mahout. Keeping these in sync would be a nightmare.
AFAIK Mahout process requires a persistent dataset (file based or database based) to train the classifier and it will build a classifier-model (an aggregated information structure on how to classify new data). Mahout will not persist email data again. Therefore I feel Mahout will need access to the email dataset either straight from gmail as the datasource of from a Isis datasource (after retrieving all Emails to Isis). If you think retrieving and storing all emails in Isis is not a good idea, maybe the EmailService can be implemented only as a connector from gmail > mahout.
* It occurs to me that you're going to need some entities to keep track of the high water mark of the most recently analyzed email, so that when you poll for new emails you know which to ask for. This high water mark is per user of RB. So I think you'll either need an entity to represent your RB User, or you could use the UserSettings service [4][5]
Yes I will definitely need to have an entity to represent the RB User. In fact User management aspect will also be key in the application since one user should not be able to access the other's email, reputation data. Thanks for the suggestions. Will it be a good idea to extend the UserSettings entity to represent RB specific user data or have a separate entity for RB_User?
* In the proposal there's the term "reputation index" is associated with the email sender. Is that the same as "Reputation".
Yes. I wanted to imply initial reputation analysis process will generate the initial reputation scores for all past emails and create Reputation profiles for each EmailSender by saying "building the reputation index"
* The initial download of emails for analysis probably needs to be done using a multiple batches (of say 100 at a time), in case there's a glitch/network issue.
Agreed. I think the Isis BackgroundService can be used for this?
* I was interested to note that you see the Isis webapp as being an email client itself. I suggest you keep it as read-only, though... otherwise you'll end up reinventing all of gmail (not advisable, think).
Yes, I would have the webapp as a readonly and demo purpose application. basically as a presentation layer of the viewmodel : EmailReputationViewModel to display the recent emails and their repuation information as well as reputation profiles of the email senders.
* One of the first tasks you've set yourself (til 21 Apr) is to "try out Apache wicket samples [10] to learn how to develop the presentation layer of the application". In fact, with Isis you don't need to do any presentation layer coding; start building out your prototype and you'll see what I mean.
I wanted to try out Apache wicket to get an understanding of the Wicket configurations, programming model to develop view-models. :)
* I'm still unsure about oAuth integration. The EmailService is going to require credentials to access gmail, and that's "within" the Isis domain model. But Shiro/buji-pac4j sits in front of Isis. If Shiro has done the oAuth sign-in, then I guess it'll be necessary to surface those credentials somehow to the EmailService (perhaps using Shiro's org.apache.shiro.SecurityUtils#getSubject() method. Perhaps the best thing is to get buji-pac4j done, then see what information is surfaced that way.
Yes, this requires some bit of research. I wanted to implement RB as a webapplication which doesn't ask the user's email credentials to perform the reputation analysis process. In the worst-case it will require the user's email credentials to perform the EmailService's email retrieval process.
In summary, thanks a lot for your insight into the project. I will setup a github project and come up with an ER diagram asap.
Thanks, Dileepa
HTH Dan
[1] http://pages.github.com/ [2] http://yuml.me/ [3] http://isis.markmail.org/thread/lsg3uywlfjviztzi [4] http://isis.apache.org/reference/services/settings-services.html [5] http://isis.apache.org/components/objectstores/jdo/services/settings-services-jdo.html
Óscar Bou Bou Responsable de Producto Auditor Jefe de Certificación ISO 27001 en BSI CISA, CRISC, APMG ISO 20000, ITIL-F
902 900 231 / 620 267 520
http://www.twitter.com/oscarbou
http://es.linkedin.com/in/oscarbou
http://www.GesConsultor.com

Este mensaje y los ficheros anexos son confidenciales. Los mismos contienen información reservada que no puede ser difundida. Si usted ha recibido este correo por error, tenga la amabilidad de eliminarlo de su sistema y avisar al remitente mediante reenvío a su dirección electrónica; no deberá copiar el mensaje ni divulgar su contenido a ninguna persona. Su dirección de correo electrónico junto a sus datos personales constan en un fichero titularidad de Gesdatos Software, S.L. cuya finalidad es la de mantener el contacto con Ud. Si quiere saber de qué información disponemos de Ud., modificarla, y en su caso, cancelarla, puede hacerlo enviando un escrito al efecto, acompañado de una fotocopia de su D.N.I. a la siguiente dirección: Gesdatos Software, S.L. , Paseo de la Castellana, 153 bajo - 28046 (Madrid), y Avda. Cortes Valencianas num. 50, 1ºC - 46015 (Valencia). Asimismo, es su responsabilidad comprobar que este mensaje o sus archivos adjuntos no contengan virus informáticos, y en caso que los tuvieran eliminarlos.
|