> I'm really interested about your concept on combining data sources: email, > facebook, linkedin and other SNSs to do semantic analysis. I'm doing my MSc > research on email reputation management which requires semantic analysis of > email data. > Please share more info, links about those topics if you have.
In the current version, it's a fairly naive approach. We download the respective Tweet, Email, Blog Post, or whatever using the appropriate protocol, and then use Tika and Boilerpipe to extract the raw text (that is mostly for web content, with email and tweet, the raw text is already available) and then push that text to Stanbol, making no distinction between a tweet or an email or whatever. When we get the entity graph back from Stanbol, we store all the triples and add Statements which link the discrete entities with the UUID we assign to each piece of content (eg, tweet, email, blog post, etc.) and now we can look for commonality by just using plain old SPARQL queries. What would be more interesting, and what we'll work on eventually, is adding more "smarts" to the actual process of doing the enhancement on the Stanbol side. This could be especially useful for something like a Tweet where you don't have much context to work with... but a TweetEnhancmentEngine could be "smarter" and dereference the profile of the user who posted the tweet, any @mention's, any hyperlinks, etc., and factor that in. Likewise for email, where you could factor in knowledge about the sender and the recipient(s) of the mail. Regarding email research... you probably already know about this, but just in case you don't - a lot of researchers use the "Enron Corpus"[1] for doing research on extracting information from email, since it's A. large, B. real-world and C. legally available. I could imagine how some social network analysis combined with something like the semantic concept extraction that Stanbol does, applied to a body of emails, could be part of a system for doing something related to reputation. [1]: https://www.cs.cmu.edu/~enron/ Phil