The Named Entity Recognizer can be used to find mentions
of certain entity types in a document/article.
For example it can detect the spans in a text which contain
person names.
The output could look like this:
<START:person> Pierre Vinken <END> will join the board as a ...
As far as I understand, after detecting that "Pierre Vinken" is a person
name mention it still must be identified to be a specific person (e.g.
linked to a unique id)
to be useful for a semantic CMS. A text search system could limit its
search to
the person mentions (text between the start and end tags) and already
improve its
precision on certain search queries, e.g. a search for Three Mobile.
OpenNLP has still no component to do this entity identification or
disambiguation, but I plan to add one in the future. Another thing which
could greatly help to identify an entity is the coreference component
which can
be used to link multiple mentions of an entity together.
The article from which I took this small sample might again mention
Pierre Vinken
as Pierre or simply as "him". The coreference component could
now link all these mentions together.
As far as I know is Stanbol the only project which
has a need to detect semantics in natural language text
and is using OpenNLP already, but I might be wrong.
Jörn
On 11/19/10 8:47 AM, Paolo Castagna wrote:
Andreas Kuckartz wrote:
http://wiki.apache.org/incubator/OpenNLPProposal
Out of curiosity: Are there noteworthy relations to these projects?
Apache Stanbol
Apache Jena
Apache Clerezza
Cheers,
Andreas
My understanding so far is that:
Standbol --> Clerezza --> {Jena, Sesame, ...}
--> == depends
Paolo
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org