[
https://issues.apache.org/jira/browse/STANBOL-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312628#comment-14312628
]
Cristian Petroaca commented on STANBOL-1279:
--------------------------------------------
Some corrections to the examples given in the last comment:
1. Spatial : Angela Merkel -> The German politician ( currrently chancellor is
not a class of Angela Merkel in yago)
2. Org membership : Rolling Stones is not recognized as a NER by OpenNLP at the
moment but a good example would be Bill Gates -> The Microsoft executive.
> Named Entity co-reference resolution engine based on yago/dbpedia contextual
> information
> ----------------------------------------------------------------------------------------
>
> Key: STANBOL-1279
> URL: https://issues.apache.org/jira/browse/STANBOL-1279
> Project: Stanbol
> Issue Type: New Feature
> Components: Enhancement Engines
> Reporter: Cristian Petroaca
> Assignee: Rupert Westenthaler
> Labels: co-reference, dbpedia, entity, named, yago
> Attachments: named_entity_coref_ver_1.patch,
> named_entity_coref_ver_2.patch, named_entity_coref_ver_3.patch
>
>
> Develop an enhancement engine that will perform co-reference resolution of
> Named Entities in a given text. The co-references will be noun phrases which
> refer to those Named Entities by having a minimal set of attributes which
> match contextual information (yago rdf:type and dbpedia spatial and object
> function giving info - more on this below) from dbpedia/yago for that Named
> Entity.
> We have the following text as an example : "Microsoft has posted its 2013
> earnings. The software company did better than expected. ... The
> Redmond-based company will hire 500 new developers this year."
> The enhancement engine will link "Microsoft" with "The software company" and
> "The Redmond-based company".
> Below there are the steps necessary in order to extract the co-references.
> Named Entity extraction
> ==================
> Extract all Named Entities from the given text. If there are no Named
> Entities then the process stops here.
> Noun Phrases extraction
> ===================
> Select all noun phrases after the first Named Entity that have:
> + a determinate pos which implies reference to an entity local to the text,
> such as "the, this, these") but not "another, every", etc which implies a
> reference to an entity outside of the text.
> + at least another noun aside from the main required noun which further
> describes it. For example I will not count "The company" as being a
> legitimate candidate since this could create a lot of false positives by
> considering the double meaning of some words such as "in the company of good
> people".
> All noun phrases need to be lemmatized in case there are any plurals.
> This step should have different logic implemented for different languages.
> This step ensures good recall.
>
> Noun Phrases matching
> ===================
> This step tries to match the previously selected noun phrases to the Named
> Entities from step 1 and establish the co-references.
> For every noun phrase the following rules will be applied:
> Yago:class matching
> --------------------------
> For each NER prior to the current noun phrase in the text match the
> yago:class label to the contents of the noun phrase. If there are no matches
> then drop the current noun phrase.
> Group membership rules matching
> -------------------------------------------
> For each NER prior to the current noun phrase:
> + Spatial membership : the noun phrase is part of a LOCATION.
> If the noun phrase contains a LOCATION or a demonym then check any location
> properties of the matching NER. These properties will be part of a generic
> ontology. For clarity I will describe the dbpedia extracted properties which
> will be aligned to this generic ontology.
> If matching NER is a :
> - person, match against :birthPlace, :region, :nationality
> - organisation, match against :foundationPlace, :locationCity, :location,
> :hometown
> - place, match against :country, :subdivisionName, :location.
> Example: The Italian President, The Richmond-based company
> + Organisational membership : the NER is part of an ORGANISATION.
> If the noun phrase contains an ORGANISATION then check the following
> properties of the maching NER. These properties will be part of a generic
> ontology. For clarity I will describe the dbpedia extracted properties which
> will be aligned to this generic ontology.
> If matching NER is :
> - person, match against :occupation, :associatedActs
> - organisation : no dbpedia properties to match
> - location : no dbpedia properties to match
> Example: The Microsoft executive, The Pink Floyd singer
> Functional description rules matching
> -----------------------------------------------
> The noun phrase describes what the NER does conceptually.
> If there are no NERs in the noun phrase then match the following properties
> of the matching NER to the contents of the noun phrase (aside from the nouns
> which are part of the yago:class) :
> If NER is a:
> - person : no dbpedia properties to match
> - organisation : , match against :service, :industry, :genre
> - location : no dbpedia properties to match
> Example: The software company.
> If no matches were found for the current NER with rules "Group membership"
> and "Functional description" rules then if the yago:class which matched has
> more than 2 nouns then we also consider this a good co-reference but with a
> lower confidence maybe.
> Ex: The former tennis player, the theoretical physicist.
> Co-references creation
> ==================
> Based on the number of nouns which matched from the previous step we create a
> confidence level. The number of matched nouns cannot be lower than 2 and we
> must have a yago:class match.
> For all NERs which got to this point, select the closest ones in the text to
> the noun phrase which matched against the same properties (yago:class and
> dbpedia) and mark them as co-references.
> The "Noun Phrases matching" and "Co-references creation" steps are designed
> to filter out all bad co-references and ensure good precision.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)