[
https://issues.apache.org/jira/browse/OPENNLP-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693925#comment-13693925
]
Mark Giaconia edited comment on OPENNLP-579 at 6/27/13 1:58 AM:
----------------------------------------------------------------
- GeoEntityLinker is functional (in a basic sense) against the USGS and
Geonames gazateers. I ran ~100K sentences through it and produced about 20K
locations, and the results look pretty good (currently it is a "high precision,
low recall" approach...)
- Implements the concept of "country context" at the doc level to help resolve
locations
- needs a better scoring approach
- currently no fuzzy string matching is being used to match the initial NER
result with the gazateer entries... it is a boolean type search against a mysql
text index. I may try doing a fuzzy search, and then using something like an
ngram signature comparison to score the match at a finer level. Currently
performs well due to the MySQL text index, but recall would suffer with obscure
location names.
-Filtering with country context is configurable (when true, only locations
within the countries found at the document level will be returned)
-the list of items that indicate countries is in the database, so it is
extensible
General Capability provided with GeoEntityLinker: Finding geonames within
unstructured text via linking a gazateer to location named entities
(geotagging, georeferencing, geo enabling text)
was (Author: giaconia_mark):
- GeoEntityLinker is functional (in a basic sense) against the USGS and
Geonames gazateers. I ran ~100K sentences through it and produced about 20K
locations, and the results look pretty good (currently it is a "high precision,
low recall" approach...)
- Implements the concept of "country context" at the doc level to help resolve
locations
- needs a better scoring approach
- currently no fuzzy string matching is being used to match the initial NER
result with the gazateer entries... it is a boolean type search against a mysql
text index. I may try doing a fuzzy search, and then using something like an
ngram signature comparison to score the match at a finer level. Currently
performs well due to the MySQL text index, but recall would suffer with obscure
location names.
-Filtering with country context is configurable (when true, only locations
within the countries found at the document level will be returned)
-the list of items that indicate countries is in the database, so it is
extensible
> Framework to dynamically link N-best matches from external data to named
> entities by type (EntityLinker framework)
> ------------------------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-579
> URL: https://issues.apache.org/jira/browse/OPENNLP-579
> Project: OpenNLP
> Issue Type: Wish
> Components: Name Finder
> Affects Versions: 1.6.0
> Environment: Any
> Reporter: Mark Giaconia
> Priority: Minor
> Labels: features
> Fix For: 1.6.0
>
> Attachments: EntityLinker_13Jun2013.zip, entityLinker_23Jun2013.zip,
> EntityLinker_26Jun2013.zip, EntityLinker_30may2013.zip,
> entitylinker_8Jun2013.zip, entitylinker_9Jun2013.zip,
> entitylinkerFramework.zip, entitylinker.properties, geonamefinder.properties,
> geonamefind.zip
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> A framework for integrating/linking external data to named entities. For
> instance, geocoding or georeferencing location entities to geonames gazateers
> can be implemented as an EntityLinker. Initially created ticket to
> specifically solve the georeferencing problem, but the framework should allow
> linkage of any external data to any entity type. Commercial applications that
> do this are expensive, and there are many free gazateers one could use to
> create solutions with OpenNLP. The capability should provide a default
> implementation using MySQL or Postgres and the USGS/Geonames Gazateers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira