Re: [GSOC] FOAF Co-reference based Entity Disambiguation WorkFlow

Rupert Westenthaler Mon, 26 Aug 2013 05:30:21 -0700

Hi Dileepa,

Sorry for the late response, but I was on vacation from 31st Jul and after
coming back I had overlooked your mail. Thx for remembering me via IRC.



On Wed, Jul 31, 2013 at 11:06 PM, Dileepa Jayakody <
[email protected]> wrote:

> Hi All,
>
> As the third milestone of my project I will describe my initial design of
> the FOAF Co-reference based entity disambiguation engine here.
>
> The main disambiguation technique used here is FOAF co-reference. This
> aims to merge multiple fise:EntityAnnotations identified by different
> surface mentions in the text to a single FOAF entity by identifying
> co-reference relationships between the entity-labels. I have an idea to
> introduce 2 new fise properties called fise:coref, fise:not-coref to denote
> the coreference relationships between the entities. Would like your
> thoughts on this idea.
>
> Contextual information extracted from the ContentItem will be used to
> identify the most suitable entity-annotation to the selected-text context.
> The co-reference calculation will be initially done in a Rule-based
> manner. Later a machine learning approach (SVM based) will be followed to
> upgrade the system.
> The disambiguation engine will calculate a disambiguation score (ds) by
> performing FOAF co-reference operations on the contextual information
> extract from the content items and modify the fise:confidence value for
> each EntitySuggestions.
>
> Basic co-reference rule to be used is :
> {?p a owl:IFP. ?a ?p ?x. ?b ?p ?x) => {?a :coref ?b}
> {?p a owl:FP . ?a ?p ?x. ?a ?p ?y.) => { ?x :coref ?y}
>
> IFP : inverse-functional property
> FP : functional property
> coref : co-referent
>

I would advice to use "all" properties defined by the ontology and not only
functional one (as indicated above).


>
> The co-reference operations will be mainly 3 types.  These 3 types will be
> implemented as sub-modules in the disambiguation-engine. The
> Map<TextAnnotation,Set<Suggestions>> will go through each module (in a
> chain-mode) for improved disambiguation results.  Mainly the disambiguation
> process aims at People disambiguation;it also could be used for the
> Organization type disambiguation.
>
> The 3 sub-modules in the engine are as follows.
>
> 1. Co-reference by foaf-field literal matching :
> This will perform a similarity matching of entity-label fields with FOAF
> fields like foaf:name, givenName, firstName, familyName, nick  and update
> confidence values for co-referring entities  (eg: matching firstName,
> givenName and nickname mentions in the content: 'Tim Bernes' Lee is also
> identified as 'timbl' as a nickname).
> It should also detect TextAnnotations of email addresses (if available)
> and match them with foaf:mbox,foaf:personalMailbox fields and phone numbers
> with foaf:phone.
> In this module, direct literal matching is performed.
>
>
To suggest possible Entities for mentions in the text is the responsibility
of EntityLinking. IMO this first part has not much to do with
disambiguation, but is mainly needed to get initial suggestions
(fise:TextAnnotation with linked fise:EntityAnnotation). Those annotations
will then be used for disambiguation in step (2) and (3).

There are already EnhancementEngines that can be used for linking against
names, family names and nick names. For email and phone number you might
need to write your won engines (maybe regex based).

I would not recommend to create fise:EntityAnnotations for given names, as
there will be way to much possibilities. With regard to that you should
have a look at the Entity co-mention engine. This is able to suggest  "Tim
Bernes' Lee" for mentions of "Tim" if "Tim Bernes' Lee" was already
mentioned by its full name earlier in the text.



> 2. Co-reference by relationship links :
> This module perform neighborhood comparison with other People,
> Organizations mentioned in the context. The relationships will be analysed
> via foaf:knows field. foaf:seeAlso, foaf:sameAs will be used as the main
> co-reference field to detect different EntityAnnotations referring to the
> same Entity.
> To detect relationships with organizations, foaf:schoolHomePage,
> foaf:workplaceHomePage will be used.
> To detect membership in groups, foaf:Group, foaf:member will be used as
> keys.
> To detect gender of the person in the context, the surface mention he/she
> will be matched against the foaf:gender.
>

> 3. Topic based matching :
> The fise:TopicAnnotations will be matched against foaf:interest (links to
> a document), foaf:TopicInterest (links to an agent/entity), foaf:topic and
> foaf:primaryTopic.
>
>
Those two suggestions look fine to me.



> I will use the same algorithm to calculate the disambiguation score as
> used in SolrMLT disambiguation engine in Stanbol.
>
> The algorithm:
>
>     dc := (oc* cw / ( cw + dw)) + (ds * dw / ( cw + dw))
>
>     oc ... original-confidence [0..1]
>     ds ... disambiguation-score [0..1]
>     dc ... disambiguated - confidence [0..1]
>     cw ... original-confidence-weight
>     dw ... disambiguation-weight
>
>
> Some questions I have:
>
>    - Is it a good idea to chain many enhancement engines other than my
>    foaf-site-engine such as NLP-Engines, TokenizerEngine, POSEngine to provide
>    many Entity Suggestions as possible before executing disambiguation?
>
> IMHO (1) "Co-reference by foaf-field literal matching " as you named it -
or "Entity Linking" as I would call it - can be a combination of many
engine. POS tagging, Named Entity Recognition, EntityhubLinking engines
(configured for different foaf properties of your profiles), some Regex
based engines for mail addresses and phone numbers, ... could all
contribute to this.

Also (2) and (3) could be done in multiple engines, but in that case you
would need to find a solution to correctly calculate the final
disambiguated fise:confidence value based on the individual results of the
different disambiguation engines. Meaning that you will need to add some
intermediate information to the RDF enhancement graph. If you do it in a
single engine you could use an Java model for that. because of that IMO it
should be simpler to start with a single engine.


>
>    - Can I use 'topic' enhancement engine in Stanbol to provide
>    fise:TopicAnnotations required in the 3rd module?
>
> If you can train a model based on your data it should work. The engine
would give your topics for the parsed text and your disambiguation engine
would compare the topics referenced of possible FOAF profiles with the one
detected by the TopicEngine for the text.

But I am wondering where you can get the trainings data for such a model.
You would need a set of documents for all the categories used by FOAF files.


>
>    - Does SentimentAnalysis engine work? if so will
>    fise:SentimentAnnotations be useful for Topic based matching?
>
> The sentiment engines do work, but I do not see how they can improve topic
based matching. Can you maybe explain your intensions.

best
Rupert


>
>
> Would like your suggestions, ideas as much as possible to improve my FOAF
> co-reference based disambiguation engine.
>
> Below is a block diagram of the workflow.
>
> [image: Inline image 1]
>
> source :
> http://creately.com/diagram/example/hjs4yd0e1/FOAF_Disambiguation_WorkFlow
>
> Thanks,
> Dileepa
>
> Reference :
> 1. "Computing FOAF Co-reference Relations with Rules and Machine
> Learning",Jennifer Sleeman and Tim Finin, University of Maryland, Baltimore
> County, In proceedings of The Third International Workshop on Social Data
> on the Web, November 2010
>
>
>
>
>
>
>


-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: [GSOC] FOAF Co-reference based Entity Disambiguation WorkFlow

Reply via email to