Re: [GSOC] FOAF Co-reference based Entity Disambiguation WorkFlow

Dileepa Jayakody Tue, 27 Aug 2013 00:42:25 -0700

Hi Rupert and All,

On Mon, Aug 26, 2013 at 5:58 PM, Rupert Westenthaler <
[email protected]> wrote:


> Hi Dileepa,
>
> Sorry for the late response, but I was on vacation from 31st Jul and after
> coming back I had overlooked your mail. Thx for remembering me via IRC.
>
>
> On Wed, Jul 31, 2013 at 11:06 PM, Dileepa Jayakody <
> [email protected]> wrote:
>
> > Hi All,
> >
> > As the third milestone of my project I will describe my initial design of
> > the FOAF Co-reference based entity disambiguation engine here.
> >
> > The main disambiguation technique used here is FOAF co-reference. This
> > aims to merge multiple fise:EntityAnnotations identified by different
> > surface mentions in the text to a single FOAF entity by identifying
> > co-reference relationships between the entity-labels. I have an idea to
> > introduce 2 new fise properties called fise:coref, fise:not-coref to
> denote
> > the coreference relationships between the entities. Would like your
> > thoughts on this idea.
> >
> > Contextual information extracted from the ContentItem will be used to
> > identify the most suitable entity-annotation to the selected-text
> context.
> > The co-reference calculation will be initially done in a Rule-based
> > manner. Later a machine learning approach (SVM based) will be followed to
> > upgrade the system.
> > The disambiguation engine will calculate a disambiguation score (ds) by
> > performing FOAF co-reference operations on the contextual information
> > extract from the content items and modify the fise:confidence value for
> > each EntitySuggestions.
> >
> > Basic co-reference rule to be used is :
> > {?p a owl:IFP. ?a ?p ?x. ?b ?p ?x) => {?a :coref ?b}
> > {?p a owl:FP . ?a ?p ?x. ?a ?p ?y.) => { ?x :coref ?y}
> >
> > IFP : inverse-functional property
> > FP : functional property
> > coref : co-referent
> >
>
> I would advice to use "all" properties defined by the ontology and not only
> functional one (as indicated above).
>
> I guess you are referring to the properties defined in
propertyfilter.config during indexing?
Yes I have configured it with foaf:* to include all foaf properties during
indexing..


> >
> > The co-reference operations will be mainly 3 types.  These 3 types will
> be
> > implemented as sub-modules in the disambiguation-engine. The
> > Map<TextAnnotation,Set<Suggestions>> will go through each module (in a
> > chain-mode) for improved disambiguation results.  Mainly the
> disambiguation
> > process aims at People disambiguation;it also could be used for the
> > Organization type disambiguation.
> >
> > The 3 sub-modules in the engine are as follows.
> >
> > 1. Co-reference by foaf-field literal matching :
> > This will perform a similarity matching of entity-label fields with FOAF
> > fields like foaf:name, givenName, firstName, familyName, nick  and update
> > confidence values for co-referring entities  (eg: matching firstName,
> > givenName and nickname mentions in the content: 'Tim Bernes' Lee is also
> > identified as 'timbl' as a nickname).
> > It should also detect TextAnnotations of email addresses (if available)
> > and match them with foaf:mbox,foaf:personalMailbox fields and phone
> numbers
> > with foaf:phone.
> > In this module, direct literal matching is performed.
> >
> >
> To suggest possible Entities for mentions in the text is the responsibility
> of EntityLinking. IMO this first part has not much to do with
> disambiguation, but is mainly needed to get initial suggestions
> (fise:TextAnnotation with linked fise:EntityAnnotation). Those annotations
> will then be used for disambiguation in step (2) and (3).

There are already EnhancementEngines that can be used for linking against
> names, family names and nick names. For email and phone number you might
> need to write your won engines (maybe regex based).
>
> I would not recommend to create fise:EntityAnnotations for given names, as
> there will be way to much possibilities. With regard to that you should
> have a look at the Entity co-mention engine. This is able to suggest  "Tim
> Bernes' Lee" for mentions of "Tim" if "Tim Bernes' Lee" was already
> mentioned by its full name earlier in the text.
>
> Thanks for the pointer, I will take a look at entitycomention engine for
this purpose..

>
>
> > 2. Co-reference by relationship links :
> > This module perform neighborhood comparison with other People,
> > Organizations mentioned in the context. The relationships will be
> analysed
> > via foaf:knows field. foaf:seeAlso, foaf:sameAs will be used as the main
> > co-reference field to detect different EntityAnnotations referring to the
> > same Entity.
> > To detect relationships with organizations, foaf:schoolHomePage,
> > foaf:workplaceHomePage will be used.
> > To detect membership in groups, foaf:Group, foaf:member will be used as
> > keys.
> > To detect gender of the person in the context, the surface mention he/she
> > will be matched against the foaf:gender.
> >
>
> > 3. Topic based matching :
> > The fise:TopicAnnotations will be matched against foaf:interest (links to
> > a document), foaf:TopicInterest (links to an agent/entity), foaf:topic
> and
> > foaf:primaryTopic.
> >
> >
> Those two suggestions look fine to me.
>
>
>
> > I will use the same algorithm to calculate the disambiguation score as
> > used in SolrMLT disambiguation engine in Stanbol.
> >
> > The algorithm:
> >
> >     dc := (oc* cw / ( cw + dw)) + (ds * dw / ( cw + dw))
> >
> >     oc ... original-confidence [0..1]
> >     ds ... disambiguation-score [0..1]
> >     dc ... disambiguated - confidence [0..1]
> >     cw ... original-confidence-weight
> >     dw ... disambiguation-weight
> >
> >
> > Some questions I have:
> >
> >    - Is it a good idea to chain many enhancement engines other than my
> >    foaf-site-engine such as NLP-Engines, TokenizerEngine, POSEngine to
> provide
> >    many Entity Suggestions as possible before executing disambiguation?
> >
> > IMHO (1) "Co-reference by foaf-field literal matching " as you named it -
> or "Entity Linking" as I would call it - can be a combination of many
> engine. POS tagging, Named Entity Recognition, EntityhubLinking engines
> (configured for different foaf properties of your profiles), some Regex
> based engines for mail addresses and phone numbers, ... could all
> contribute to this.


> Also (2) and (3) could be done in multiple engines, but in that case you
> would need to find a solution to correctly calculate the final
> disambiguated fise:confidence value based on the individual results of the
> different disambiguation engines. Meaning that you will need to add some
> intermediate information to the RDF enhancement graph. If you do it in a
> single engine you could use an Java model for that. because of that IMO it
> should be simpler to start with a single engine.
>
> Yes, initially I will start developing it in a single engine :)

>
> >
> >    - Can I use 'topic' enhancement engine in Stanbol to provide
> >    fise:TopicAnnotations required in the 3rd module?
> >
> > If you can train a model based on your data it should work. The engine
> would give your topics for the parsed text and your disambiguation engine
> would compare the topics referenced of possible FOAF profiles with the one
> detected by the TopicEngine for the text.
>
> But I am wondering where you can get the trainings data for such a model.
> You would need a set of documents for all the categories used by FOAF
> files.
>

I think I haven't yet grasped the topic-annotation concept in stanbol
properly. I was expecting to use the topic-engine configured in the
enhancement chain, and retrieve TopicAnnotations out of the box and use
those TopicAnnotations to match against foaf:primaryTopic, foaf:interest
properties in my engine...to train a model can I use an existing model
rather than training it with foaf site I have implemented? Forgive me if
this is a stupid question :)

>




>
>
> >    - Does SentimentAnalysis engine work? if so will
> >    fise:SentimentAnnotations be useful for Topic based matching?
> >
> > The sentiment engines do work, but I do not see how they can improve
> topic
> based matching. Can you maybe explain your intensions.
>
> I was initially thinking that, topics and sentiment-summaries can be
co-related, therefore use thse sentimentAnnotations to map with
foafi:primaryTopic/interest in above suggested 3rd module.
Maybe this is something not so practical :)

I will start implementing with a simple model to use several entity-linking
engines to propose entityAnnotations as much as possible and use the 2nd
module's approach to co-ref foaf relationships.
Will update the thread with my progress..

Thanks a lot for your valuable insight.

Regards,
Dileepa

> best
> Rupert
>
>
> >
> >
> > Would like your suggestions, ideas as much as possible to improve my FOAF
> > co-reference based disambiguation engine.
> >
> > Below is a block diagram of the workflow.
> >
> > [image: Inline image 1]
> >
> > source :
> >
> http://creately.com/diagram/example/hjs4yd0e1/FOAF_Disambiguation_WorkFlow
> >
> > Thanks,
> > Dileepa
> >
> > Reference :
> > 1. "Computing FOAF Co-reference Relations with Rules and Machine
> > Learning",Jennifer Sleeman and Tim Finin, University of Maryland,
> Baltimore
> > County, In proceedings of The Third International Workshop on Social Data
> > on the Web, November 2010
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: [GSOC] FOAF Co-reference based Entity Disambiguation WorkFlow

Reply via email to