Hi Rupert and All, On Mon, Aug 26, 2013 at 5:58 PM, Rupert Westenthaler < [email protected]> wrote:
> Hi Dileepa, > > Sorry for the late response, but I was on vacation from 31st Jul and after > coming back I had overlooked your mail. Thx for remembering me via IRC. > > > On Wed, Jul 31, 2013 at 11:06 PM, Dileepa Jayakody < > [email protected]> wrote: > > > Hi All, > > > > As the third milestone of my project I will describe my initial design of > > the FOAF Co-reference based entity disambiguation engine here. > > > > The main disambiguation technique used here is FOAF co-reference. This > > aims to merge multiple fise:EntityAnnotations identified by different > > surface mentions in the text to a single FOAF entity by identifying > > co-reference relationships between the entity-labels. I have an idea to > > introduce 2 new fise properties called fise:coref, fise:not-coref to > denote > > the coreference relationships between the entities. Would like your > > thoughts on this idea. > > > > Contextual information extracted from the ContentItem will be used to > > identify the most suitable entity-annotation to the selected-text > context. > > The co-reference calculation will be initially done in a Rule-based > > manner. Later a machine learning approach (SVM based) will be followed to > > upgrade the system. > > The disambiguation engine will calculate a disambiguation score (ds) by > > performing FOAF co-reference operations on the contextual information > > extract from the content items and modify the fise:confidence value for > > each EntitySuggestions. > > > > Basic co-reference rule to be used is : > > {?p a owl:IFP. ?a ?p ?x. ?b ?p ?x) => {?a :coref ?b} > > {?p a owl:FP . ?a ?p ?x. ?a ?p ?y.) => { ?x :coref ?y} > > > > IFP : inverse-functional property > > FP : functional property > > coref : co-referent > > > > I would advice to use "all" properties defined by the ontology and not only > functional one (as indicated above). > > I guess you are referring to the properties defined in propertyfilter.config during indexing? Yes I have configured it with foaf:* to include all foaf properties during indexing.. > > > > The co-reference operations will be mainly 3 types. These 3 types will > be > > implemented as sub-modules in the disambiguation-engine. The > > Map<TextAnnotation,Set<Suggestions>> will go through each module (in a > > chain-mode) for improved disambiguation results. Mainly the > disambiguation > > process aims at People disambiguation;it also could be used for the > > Organization type disambiguation. > > > > The 3 sub-modules in the engine are as follows. > > > > 1. Co-reference by foaf-field literal matching : > > This will perform a similarity matching of entity-label fields with FOAF > > fields like foaf:name, givenName, firstName, familyName, nick and update > > confidence values for co-referring entities (eg: matching firstName, > > givenName and nickname mentions in the content: 'Tim Bernes' Lee is also > > identified as 'timbl' as a nickname). > > It should also detect TextAnnotations of email addresses (if available) > > and match them with foaf:mbox,foaf:personalMailbox fields and phone > numbers > > with foaf:phone. > > In this module, direct literal matching is performed. > > > > > To suggest possible Entities for mentions in the text is the responsibility > of EntityLinking. IMO this first part has not much to do with > disambiguation, but is mainly needed to get initial suggestions > (fise:TextAnnotation with linked fise:EntityAnnotation). Those annotations > will then be used for disambiguation in step (2) and (3). There are already EnhancementEngines that can be used for linking against > names, family names and nick names. For email and phone number you might > need to write your won engines (maybe regex based). > > I would not recommend to create fise:EntityAnnotations for given names, as > there will be way to much possibilities. With regard to that you should > have a look at the Entity co-mention engine. This is able to suggest "Tim > Bernes' Lee" for mentions of "Tim" if "Tim Bernes' Lee" was already > mentioned by its full name earlier in the text. > > Thanks for the pointer, I will take a look at entitycomention engine for this purpose.. > > > > 2. Co-reference by relationship links : > > This module perform neighborhood comparison with other People, > > Organizations mentioned in the context. The relationships will be > analysed > > via foaf:knows field. foaf:seeAlso, foaf:sameAs will be used as the main > > co-reference field to detect different EntityAnnotations referring to the > > same Entity. > > To detect relationships with organizations, foaf:schoolHomePage, > > foaf:workplaceHomePage will be used. > > To detect membership in groups, foaf:Group, foaf:member will be used as > > keys. > > To detect gender of the person in the context, the surface mention he/she > > will be matched against the foaf:gender. > > > > > 3. Topic based matching : > > The fise:TopicAnnotations will be matched against foaf:interest (links to > > a document), foaf:TopicInterest (links to an agent/entity), foaf:topic > and > > foaf:primaryTopic. > > > > > Those two suggestions look fine to me. > > > > > I will use the same algorithm to calculate the disambiguation score as > > used in SolrMLT disambiguation engine in Stanbol. > > > > The algorithm: > > > > dc := (oc* cw / ( cw + dw)) + (ds * dw / ( cw + dw)) > > > > oc ... original-confidence [0..1] > > ds ... disambiguation-score [0..1] > > dc ... disambiguated - confidence [0..1] > > cw ... original-confidence-weight > > dw ... disambiguation-weight > > > > > > Some questions I have: > > > > - Is it a good idea to chain many enhancement engines other than my > > foaf-site-engine such as NLP-Engines, TokenizerEngine, POSEngine to > provide > > many Entity Suggestions as possible before executing disambiguation? > > > > IMHO (1) "Co-reference by foaf-field literal matching " as you named it - > or "Entity Linking" as I would call it - can be a combination of many > engine. POS tagging, Named Entity Recognition, EntityhubLinking engines > (configured for different foaf properties of your profiles), some Regex > based engines for mail addresses and phone numbers, ... could all > contribute to this. > Also (2) and (3) could be done in multiple engines, but in that case you > would need to find a solution to correctly calculate the final > disambiguated fise:confidence value based on the individual results of the > different disambiguation engines. Meaning that you will need to add some > intermediate information to the RDF enhancement graph. If you do it in a > single engine you could use an Java model for that. because of that IMO it > should be simpler to start with a single engine. > > Yes, initially I will start developing it in a single engine :) > > > > > - Can I use 'topic' enhancement engine in Stanbol to provide > > fise:TopicAnnotations required in the 3rd module? > > > > If you can train a model based on your data it should work. The engine > would give your topics for the parsed text and your disambiguation engine > would compare the topics referenced of possible FOAF profiles with the one > detected by the TopicEngine for the text. > > But I am wondering where you can get the trainings data for such a model. > You would need a set of documents for all the categories used by FOAF > files. > I think I haven't yet grasped the topic-annotation concept in stanbol properly. I was expecting to use the topic-engine configured in the enhancement chain, and retrieve TopicAnnotations out of the box and use those TopicAnnotations to match against foaf:primaryTopic, foaf:interest properties in my engine...to train a model can I use an existing model rather than training it with foaf site I have implemented? Forgive me if this is a stupid question :) > > > > > - Does SentimentAnalysis engine work? if so will > > fise:SentimentAnnotations be useful for Topic based matching? > > > > The sentiment engines do work, but I do not see how they can improve > topic > based matching. Can you maybe explain your intensions. > > I was initially thinking that, topics and sentiment-summaries can be co-related, therefore use thse sentimentAnnotations to map with foafi:primaryTopic/interest in above suggested 3rd module. Maybe this is something not so practical :) I will start implementing with a simple model to use several entity-linking engines to propose entityAnnotations as much as possible and use the 2nd module's approach to co-ref foaf relationships. Will update the thread with my progress.. Thanks a lot for your valuable insight. Regards, Dileepa > best > Rupert > > > > > > > > Would like your suggestions, ideas as much as possible to improve my FOAF > > co-reference based disambiguation engine. > > > > Below is a block diagram of the workflow. > > > > [image: Inline image 1] > > > > source : > > > http://creately.com/diagram/example/hjs4yd0e1/FOAF_Disambiguation_WorkFlow > > > > Thanks, > > Dileepa > > > > Reference : > > 1. "Computing FOAF Co-reference Relations with Rules and Machine > > Learning",Jennifer Sleeman and Tim Finin, University of Maryland, > Baltimore > > County, In proceedings of The Third International Workshop on Social Data > > on the Web, November 2010 > > > > > > > > > > > > > > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >
