Hi Cristian,
Without having more details about your concrete heuristic, in my honest
opinion, such approach could produce a lot of false positives. I don't
know if you are planning to use some "locality" features to detect such
coreferences but you need to take into account that it is quite usual
that coreferenced mentions can occurs even in different paragraphs.
Although I'm not an expert in Natural Language Understanding, I would
say it is quite difficult to get decent precision/recall rates for
coreferencing using fixed rules. Maybe you can give a try to others
tools like BART (http://www.bart-coref.org/).
Cheers,
Rafa Haro
El 30/01/14 10:33, Cristian Petroaca escribió:
Hi,
One of the necessary steps for implementing the Event extraction Engine
feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to have
coreference resolution in the given text. This is provided now via the
stanford-nlp project but as far as I saw this module is performing mostly
pronomial (He, She) or nominal (Barack Obama and Mr. Obama) coreference
resolution.
In order to get more coreferences from the text I though of creating some
logic that would detect this kind of coreference :
"Apple reaches new profit heights. The software company just announced its
2013 earnings."
Here "The software company" obviously refers to "Apple".
So I'd like to detect coreferences of Named Entities which are of the
rdf:type of the Named Entity , in this case "company" and also have
attributes which can be found in the dbpedia categories of the named
entity, in this case "software".
The detection of coreferences such as "The software company" in the text
would also be done by either using the new Pos Tag Based Phrase extraction
Engine (noun phrases) or by using a dependency tree of the sentence and
picking up only subjects or objects.
At this point I'd like to know if this kind of logic would be useful as a
separate Enhancement Engine (in case the precision and recall are good
enough) in Stanbol?
Thanks,
Cristian