Hi All, As the third milestone of my project I will describe my initial design of the FOAF Co-reference based entity disambiguation engine here.
The main disambiguation technique used here is FOAF co-reference. This aims to merge multiple fise:EntityAnnotations identified by different surface mentions in the text to a single FOAF entity by identifying co-reference relationships between the entity-labels. I have an idea to introduce 2 new fise properties called fise:coref, fise:not-coref to denote the coreference relationships between the entities. Would like your thoughts on this idea. Contextual information extracted from the ContentItem will be used to identify the most suitable entity-annotation to the selected-text context. The co-reference calculation will be initially done in a Rule-based manner. Later a machine learning approach (SVM based) will be followed to upgrade the system. The disambiguation engine will calculate a disambiguation score (ds) by performing FOAF co-reference operations on the contextual information extract from the content items and modify the fise:confidence value for each EntitySuggestions. Basic co-reference rule to be used is : {?p a owl:IFP. ?a ?p ?x. ?b ?p ?x) => {?a :coref ?b} {?p a owl:FP . ?a ?p ?x. ?a ?p ?y.) => { ?x :coref ?y} IFP : inverse-functional property FP : functional property coref : co-referent The co-reference operations will be mainly 3 types. These 3 types will be implemented as sub-modules in the disambiguation-engine. The Map<TextAnnotation,Set<Suggestions>> will go through each module (in a chain-mode) for improved disambiguation results. Mainly the disambiguation process aims at People disambiguation;it also could be used for the Organization type disambiguation. The 3 sub-modules in the engine are as follows. 1. Co-reference by foaf-field literal matching : This will perform a similarity matching of entity-label fields with FOAF fields like foaf:name, givenName, firstName, familyName, nick and update confidence values for co-referring entities (eg: matching firstName, givenName and nickname mentions in the content: 'Tim Bernes' Lee is also identified as 'timbl' as a nickname). It should also detect TextAnnotations of email addresses (if available) and match them with foaf:mbox,foaf:personalMailbox fields and phone numbers with foaf:phone. In this module, direct literal matching is performed. 2. Co-reference by relationship links : This module perform neighborhood comparison with other People, Organizations mentioned in the context. The relationships will be analysed via foaf:knows field. foaf:seeAlso, foaf:sameAs will be used as the main co-reference field to detect different EntityAnnotations referring to the same Entity. To detect relationships with organizations, foaf:schoolHomePage, foaf:workplaceHomePage will be used. To detect membership in groups, foaf:Group, foaf:member will be used as keys. To detect gender of the person in the context, the surface mention he/she will be matched against the foaf:gender. 3. Topic based matching : The fise:TopicAnnotations will be matched against foaf:interest (links to a document), foaf:TopicInterest (links to an agent/entity), foaf:topic and foaf:primaryTopic. I will use the same algorithm to calculate the disambiguation score as used in SolrMLT disambiguation engine in Stanbol. The algorithm: dc := (oc* cw / ( cw + dw)) + (ds * dw / ( cw + dw)) oc ... original-confidence [0..1] ds ... disambiguation-score [0..1] dc ... disambiguated - confidence [0..1] cw ... original-confidence-weight dw ... disambiguation-weight Some questions I have: - Is it a good idea to chain many enhancement engines other than my foaf-site-engine such as NLP-Engines, TokenizerEngine, POSEngine to provide many Entity Suggestions as possible before executing disambiguation? - Can I use 'topic' enhancement engine in Stanbol to provide fise:TopicAnnotations required in the 3rd module? - Does SentimentAnalysis engine work? if so will fise:SentimentAnnotations be useful for Topic based matching? Would like your suggestions, ideas as much as possible to improve my FOAF co-reference based disambiguation engine. Below is a block diagram of the workflow. [image: Inline image 1] source : http://creately.com/diagram/example/hjs4yd0e1/FOAF_Disambiguation_WorkFlow Thanks, Dileepa Reference : 1. "Computing FOAF Co-reference Relations with Rules and Machine Learning",Jennifer Sleeman and Tim Finin, University of Maryland, Baltimore County, In proceedings of The Third International Workshop on Social Data on the Web, November 2010