Thanks a lot for the detailed information Phil.
On Fri, Nov 15, 2013 at 1:06 AM, Phillip Rhodes <motley.crue....@gmail.com>wrote: > > I'm really interested about your concept on combining data sources: > email, > > facebook, linkedin and other SNSs to do semantic analysis. I'm doing my > MSc > > research on email reputation management which requires semantic analysis > of > > email data. > > Please share more info, links about those topics if you have. > > In the current version, it's a fairly naive approach. We download the > respective Tweet, Email, Blog Post, or whatever using the appropriate > protocol, and then use Tika and Boilerpipe to extract the raw text > (that is mostly for web content, with email and tweet, the raw text is > already available) and then push that text to Stanbol, making no > distinction between a tweet or an email or whatever. > > When we get the entity graph back from Stanbol, we store all the > triples and add Statements which link the discrete entities with the > UUID we assign to each piece of content (eg, tweet, email, blog post, > etc.) and now we can look for commonality by just using plain old > SPARQL queries. > > What would be more interesting, and what we'll work on eventually, is > adding more "smarts" to the actual process of doing the enhancement on > the Stanbol side. This could be especially useful for something like > a Tweet where you don't have much context to work with... but a > TweetEnhancmentEngine could be "smarter" and dereference the profile > of the user who posted the tweet, any @mention's, any hyperlinks, > etc., and factor that in. Likewise for email, where you could factor > in knowledge about the sender and the recipient(s) of the mail. > > Regarding email research... you probably already know about this, but > just in case you don't - a lot of researchers use the "Enron > Corpus"[1] for doing research on extracting information from email, > since it's A. large, B. real-world and C. legally available. I could > imagine how some social network analysis combined with something like > the semantic concept extraction that Stanbol does, applied to a body > of emails, could be part of a system for doing something related to > reputation. > > [1]: https://www.cs.cmu.edu/~enron/ > > > Phil >