> I'm really interested about your concept on combining data sources: email,
> facebook, linkedin and other SNSs to do semantic analysis. I'm doing my MSc
> research on email reputation management which requires semantic analysis of
> email data.
> Please share more info, links about those topics if you have.

In the current version, it's a fairly naive approach.  We download the
respective Tweet, Email, Blog Post, or whatever using the appropriate
protocol, and then use Tika and Boilerpipe to extract the raw text
(that is mostly for web content, with email and tweet, the raw text is
already available) and then push that text to Stanbol, making no
distinction between a tweet or an email or whatever.

When we get the entity graph back from Stanbol, we store all the
triples and add Statements which link the discrete entities with the
UUID we assign to each piece of content (eg, tweet, email, blog post,
etc.)  and now we can look for commonality by just using plain old
SPARQL queries.

What would be more interesting, and what we'll work on eventually, is
adding more "smarts" to the actual process of doing the enhancement on
the Stanbol side.  This could be especially useful for something like
a Tweet where you don't have much context to work with... but a
TweetEnhancmentEngine could be "smarter" and dereference the profile
of the user who posted the tweet, any @mention's, any hyperlinks,
etc., and factor that in.   Likewise for email, where you could factor
in knowledge about the sender and the recipient(s) of the mail.

Regarding email research...  you probably already know about this, but
just in case you don't -  a lot of researchers use the "Enron
Corpus"[1] for doing research on extracting information from email,
since it's A. large, B. real-world and C. legally available.   I could
imagine how some social network analysis combined with something like
the semantic concept extraction that Stanbol does, applied to a body
of emails, could be part of a system for doing something related to
reputation.

[1]: https://www.cs.cmu.edu/~enron/


Phil

Reply via email to