[ 
https://issues.apache.org/jira/browse/OPENNLP-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693925#comment-13693925
 ] 

Mark Giaconia edited comment on OPENNLP-579 at 6/27/13 1:58 AM:
----------------------------------------------------------------

- GeoEntityLinker is functional (in a basic sense) against the USGS and 
Geonames gazateers. I ran ~100K sentences through it and produced about 20K 
locations, and the results look pretty good (currently it is a "high precision, 
low recall" approach...)
- Implements the concept of "country context" at the doc level to help resolve 
locations
- needs a better scoring approach
- currently no fuzzy string matching is being used to match the initial NER 
result with the gazateer entries... it is a boolean type search against a mysql 
text index. I may try doing a fuzzy search, and then using something like an 
ngram signature comparison to score the match at a finer level. Currently 
performs well due to the MySQL text index, but recall would suffer with obscure 
location names.
-Filtering with country context is configurable (when true, only locations 
within the countries found at the document level will be returned)
-the list of items that indicate countries is in the database, so it is 
extensible
General Capability provided with GeoEntityLinker: Finding geonames within 
unstructured text via linking a gazateer to location named entities 
(geotagging, georeferencing, geo enabling text)
                
      was (Author: giaconia_mark):
    - GeoEntityLinker is functional (in a basic sense) against the USGS and 
Geonames gazateers. I ran ~100K sentences through it and produced about 20K 
locations, and the results look pretty good (currently it is a "high precision, 
low recall" approach...)
- Implements the concept of "country context" at the doc level to help resolve 
locations
- needs a better scoring approach
- currently no fuzzy string matching is being used to match the initial NER 
result with the gazateer entries... it is a boolean type search against a mysql 
text index. I may try doing a fuzzy search, and then using something like an 
ngram signature comparison to score the match at a finer level. Currently 
performs well due to the MySQL text index, but recall would suffer with obscure 
location names.
-Filtering with country context is configurable (when true, only locations 
within the countries found at the document level will be returned)
-the list of items that indicate countries is in the database, so it is 
extensible
                  
> Framework to dynamically link N-best matches from external data to named 
> entities by type (EntityLinker framework)
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-579
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-579
>             Project: OpenNLP
>          Issue Type: Wish
>          Components: Name Finder
>    Affects Versions: 1.6.0
>         Environment: Any
>            Reporter: Mark Giaconia
>            Priority: Minor
>              Labels: features
>             Fix For: 1.6.0
>
>         Attachments: EntityLinker_13Jun2013.zip, entityLinker_23Jun2013.zip, 
> EntityLinker_26Jun2013.zip, EntityLinker_30may2013.zip, 
> entitylinker_8Jun2013.zip, entitylinker_9Jun2013.zip, 
> entitylinkerFramework.zip, entitylinker.properties, geonamefinder.properties, 
> geonamefind.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> A framework for integrating/linking external data to named entities. For 
> instance, geocoding or georeferencing location entities to geonames gazateers 
> can be implemented as an EntityLinker. Initially created ticket to 
> specifically solve the georeferencing problem, but the framework should allow 
> linkage of any external data to any entity type. Commercial applications that 
> do this are expensive, and there are many free gazateers one could use to 
> create solutions with OpenNLP. The capability should provide a default 
> implementation using MySQL or Postgres and the USGS/Geonames Gazateers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to