Adam Estrada created TIKA-1106:
----------------------------------

             Summary: CLAVIN Integration
                 Key: TIKA-1106
                 URL: https://issues.apache.org/jira/browse/TIKA-1106
             Project: Tika
          Issue Type: Wish
          Components: general
    Affects Versions: 1.3
         Environment: All
            Reporter: Adam Estrada
            Priority: Minor
             Fix For: 1.4


I've been evaluating CLAVIN as a way to extract location information from 
unstructured text. It seems like meshing it with Tika in some way would make a 
lot of sense. From CLAVIN website...

{quote}
CLAVIN (*Cartographic Location And Vicinity INdexer*) is an open source 
software package for document geotagging and geoparsing that employs 
context-based geographic entity resolution. It combines a variety of open 
source tools with natural language processing techniques to extract location 
names from unstructured text documents and resolve them against gazetteer 
records. Importantly, CLAVIN does not simply "look up" location names; rather, 
it uses intelligent heuristics in an attempt to identify precisely which 
"Springfield" (for example) was intended by the author, based on the context of 
the document. CLAVIN also employs fuzzy search to handle incorrectly-spelled 
location names, and it recognizes alternative names (e.g., "Ivory Coast" and 
"Côte d'Ivoire") as referring to the same geographic entity. By enriching text 
documents with structured geo data, CLAVIN enables hierarchical geospatial 
search and advanced geospatial analytics on unstructured data.
{quote}

There was only one other instance of the word "clavin" mentioned in the ASF 
jira site so I thought it was definitely worth posting here.

https://github.com/Berico-Technologies/CLAVIN

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to