Thanks a lot for the detailed information Phil.

On Fri, Nov 15, 2013 at 1:06 AM, Phillip Rhodes
<motley.crue....@gmail.com>wrote:

> > I'm really interested about your concept on combining data sources:
> email,
> > facebook, linkedin and other SNSs to do semantic analysis. I'm doing my
> MSc
> > research on email reputation management which requires semantic analysis
> of
> > email data.
> > Please share more info, links about those topics if you have.
>
> In the current version, it's a fairly naive approach.  We download the
> respective Tweet, Email, Blog Post, or whatever using the appropriate
> protocol, and then use Tika and Boilerpipe to extract the raw text
> (that is mostly for web content, with email and tweet, the raw text is
> already available) and then push that text to Stanbol, making no
> distinction between a tweet or an email or whatever.
>
> When we get the entity graph back from Stanbol, we store all the
> triples and add Statements which link the discrete entities with the
> UUID we assign to each piece of content (eg, tweet, email, blog post,
> etc.)  and now we can look for commonality by just using plain old
> SPARQL queries.
>
> What would be more interesting, and what we'll work on eventually, is
> adding more "smarts" to the actual process of doing the enhancement on
> the Stanbol side.  This could be especially useful for something like
> a Tweet where you don't have much context to work with... but a
> TweetEnhancmentEngine could be "smarter" and dereference the profile
> of the user who posted the tweet, any @mention's, any hyperlinks,
> etc., and factor that in.   Likewise for email, where you could factor
> in knowledge about the sender and the recipient(s) of the mail.
>
> Regarding email research...  you probably already know about this, but
> just in case you don't -  a lot of researchers use the "Enron
> Corpus"[1] for doing research on extracting information from email,
> since it's A. large, B. real-world and C. legally available.   I could
> imagine how some social network analysis combined with something like
> the semantic concept extraction that Stanbol does, applied to a body
> of emails, could be part of a system for doing something related to
> reputation.
>
> [1]: https://www.cs.cmu.edu/~enron/
>
>
> Phil
>

Reply via email to