Hi, Dileepa.

Just some questions for helping in validating the model.

Why not a variation like this?
http://yuml.me/edit/825d7db5

Still not clear to me why the Reputation entity has a relationship with EmailContact also, and not only to an Email. 
The EmailContact relationship could always be derived from the Emails sender (EmailContact) so, unless you're explicitly modeling that derived relationship, it shouldn't appear.

HTH,

Oscar




El 25/03/2014, a las 21:20, Dileepa Jayakody <[email protected]> escribió:

Hi Dan and all,

Here is the basic class diagram for the domain entitiies in RB :
http://yuml.me/825d7db5

Please note that I have used the name EmailContact instead of
EmailSenderProfile for clarity purpose. Effectively this entity represents
the email contacts in the user's inbox.

Each email and email contact will have a corresponding Reputation entity.
And in the view  models, EmailReputationViewModel will display emails with
their reputation data and ContactReputationViewModel will display email
contacts with their reputation data in the RB web application.

Your ideas and suggestions are most welcome.

Thanks,
Dileepa


On Tue, Mar 25, 2014 at 3:42 PM, Dileepa Jayakody <[email protected]
wrote:

Hi Dan,

Thanks a lot for your insight. Please see my comments inline below.


On Tue, Mar 25, 2014 at 1:21 PM, Dan Haywood <[email protected]
wrote:

Hi Dileepa,

I've just posted the comments below on your GSOC proposal.  I know that
you can't make further changes to the proposal, so I'm posting them here on
the dev list, so we can keep the conversation going.

So..

* good to see you intend to set up a project on github for this; please
do this asap.  That way you can start to capture docs/working notes.  I
also suggest that you set up github pages for your site [1].


* What I'd like to see right now is some sort of UML diagram; you could
sketch one using yuml.me [2] and add it to your github site.  I can't
quite work out how the persistent domain entities relate to each other.  In
particular, are EmailSenderProfile and Reputation in 1-1 correspondence?


I will draw a ER diagram for the domain entities and we can enhance it
over discussions.
Yes I pictured EmailSenderProfile as the representation of an email sender
(a contact) and each email sender will have a corresponding reputation
score (accumulated and normalized reputation-score over the emails sent by
him) represented by the Reputation domain entity.



* In your timeline I noticed you said "Commit all code to github", only
on Aug 11.  It's much better practice (and will help mentors guide you) if
you commit changes as you go.  That way it's also safely backed up, and you
can go back in time if you mess up.


Yes I agree, in fact I didn't mean I'm going to commit all code at once
only on Aug 11. I meant to say I'm planning to finish development  and
commit everything by Aug 11.
I strongly agree on getting feedback along the way of development, after
all I'm looking at using agile development for my project :). Sorry for
having interpreted my idea in a misleading way on the proposal.


* You might also want to version control the academic paper, too, if your
university lets you.


Some further points relating to the design:

* You have Email as a persistent entity.  I'm a bit worried what that
might mean about storage and also synchronization.  Is it necessary to have
the Email persisted in Isis?  If not persisted, then should the Email
entity be a view model, or as a fake persistent entity utilizing a new
StoreManager impl in JDO.  See the recent thread [3] on this topic.

Email entity will have several attributes such as : id, sender-id,
reputation-score. sender-id will be mapped to the EmailSenderProfile and
reputation-score will be a score given by the ML process evaluating the
reputation of the email. Could email-entity be a view model in this
scenario? If so what is the advantage of defining it as a view-model?

I think we can discuss more on this with a ER diagram for the application.
I will come up with a ER diagram asap.


* Conversely, does Mahout require some sort of persistent dataset of
emails in order to do the reputation scoring?  Or does it just hold
aggregated information?  If the former, I worry that we now have each email
stored in potentially 3 places: gmail, Isis and Mahout.  Keeping these in
sync would be a nightmare.


AFAIK Mahout process requires a persistent dataset (file based or database
based) to train the classifier and it will build a classifier-model (an
aggregated information structure on how to classify new data). Mahout will
not persist email data again.
Therefore I feel Mahout will need access to the email dataset either
straight from gmail as the datasource of from a Isis datasource (after
retrieving all Emails to Isis).
If you think retrieving and storing all emails in Isis is not a good idea,
maybe the EmailService can be implemented only as a connector from gmail >
mahout.


* It occurs to me that you're going to need some entities to keep track
of the high water mark of the most recently analyzed email, so that when
you poll for new emails you know which to ask for.  This high water mark is
per user of RB.  So I think you'll either need an entity to represent your
RB User, or you could use the UserSettings service [4][5]


Yes I will definitely need to have an entity to represent the RB User. In
fact User management aspect will also be key in the  application since one
user should not be able to access the other's email, reputation data.
Thanks for the suggestions. Will it be a good idea to extend the
UserSettings entity to represent RB specific user data or have a separate
entity for RB_User?



* In the proposal there's the term "reputation index" is associated with
the email sender.  Is that the same as "Reputation".


Yes. I wanted to imply initial reputation analysis process will generate
the initial reputation scores for all past emails and create Reputation
profiles for each EmailSender by saying "building the reputation index"


* The initial download of emails for analysis probably needs to be done
using a multiple batches (of say 100 at a time), in case there's a
glitch/network issue.


Agreed. I think the Isis BackgroundService can be used for this?



* I was interested to note that you see the Isis webapp as being an email
client itself.  I suggest you keep it as read-only, though... otherwise
you'll end up reinventing all of gmail (not advisable, think).


Yes, I would have the webapp as a readonly and demo purpose application.
basically as a presentation layer of the viewmodel :
EmailReputationViewModel to display the recent emails and their repuation
information as well as reputation profiles of the email senders.


* One of the first tasks you've set yourself (til 21 Apr) is to "try out
Apache wicket samples [10] to learn how to develop the presentation layer
of the application".  In fact, with Isis you don't need to do any
presentation layer coding; start building out your prototype and you'll see
what I mean.


I wanted to try out Apache wicket to get an understanding of the Wicket
configurations, programming model to develop view-models. :)


* I'm still unsure about oAuth integration.  The EmailService is going to
require credentials to access gmail, and that's "within" the Isis domain
model.  But Shiro/buji-pac4j sits in front of Isis.  If Shiro has done the
oAuth sign-in, then I guess it'll be necessary to surface those credentials
somehow to the EmailService (perhaps using Shiro's
org.apache.shiro.SecurityUtils#getSubject() method.  Perhaps the best thing
is to get buji-pac4j done, then see what information is surfaced that way.

Yes, this requires some bit of research. I wanted to implement RB as a
webapplication which doesn't ask the user's email credentials to perform
the reputation analysis process. In the worst-case it will require the
user's email credentials to perform the EmailService's email retrieval
process.

In summary, thanks a lot for your insight into the project. I will setup a
github project and come up with an ER diagram asap.

Thanks,
Dileepa



HTH
Dan

[1] http://pages.github.com/
[2] http://yuml.me/
[3] http://isis.markmail.org/thread/lsg3uywlfjviztzi
[4] http://isis.apache.org/reference/services/settings-services.html
[5]
http://isis.apache.org/components/objectstores/jdo/services/settings-services-jdo.html





Óscar Bou Bou
Responsable de Producto
Auditor Jefe de Certificación ISO 27001 en BSI
CISA, CRISC, APMG ISO 20000, ITIL-F

   902 900 231 / 620 267 520
   http://www.twitter.com/oscarbou

   http://es.linkedin.com/in/oscarbou

   http://www.GesConsultor.com 



Este mensaje y los ficheros anexos son confidenciales. Los mismos contienen información reservada que no puede ser difundida. Si usted ha recibido este correo por error, tenga la amabilidad de eliminarlo de su sistema y avisar al remitente mediante reenvío a su dirección electrónica; no deberá copiar el mensaje ni divulgar su contenido a ninguna persona.
Su dirección de correo electrónico junto a sus datos personales constan en un fichero titularidad de Gesdatos Software, S.L. cuya finalidad es la de mantener el contacto con Ud. Si quiere saber de qué información disponemos de Ud., modificarla, y en su caso, cancelarla, puede hacerlo enviando un escrito al efecto, acompañado de una fotocopia de su D.N.I. a la siguiente dirección: Gesdatos Software, S.L. , Paseo de la Castellana, 153 bajo - 28046 (Madrid), y Avda. Cortes Valencianas num. 50, 1ºC - 46015 (Valencia). Asimismo, es su responsabilidad comprobar que este mensaje o sus archivos adjuntos no contengan virus informáticos, y en caso que los tuvieran eliminarlos.





Reply via email to