Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "GeoTopicParser" page has been changed by ChrisMattmann: https://wiki.apache.org/tika/GeoTopicParser?action=diff&rev1=1&rev2=2 GeoTopicParser uses [[http://lucene.apache.org/|Apache Lucene]] and [[http://opennlp.apache.org/|Apache OpenNLP]] to provide its capabilities. + = Installing the Lucene Gazetteer = + + First you will need to download the [[http://github.com/chrismattmann/lucene-geo-gazetteer|Lucene Geo Gazetteer]] project and to install it. You can do so by: + + {{{ + $ cd $HOME/src + $ git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git + $ cd lucene-geo-gazetteer + $ mvn install + $ add $HOME/src/lucene-geo-gazetteer/src/main/bin to your PATH environment variable + }}} + + Once done, you can verify that the installation worked by running the following command: + + {{{ + $ lucene-geo-gazetteer --help + usage: lucene-geo-gazetteer + -b,--build <gazetteer file> The Path to the Geonames + allCountries.txt + -h,--help Print this message. + -i,--index <directoryPath> The path to the Lucene index + directory to either create or read + -s,--search <set of location names> Location names to search the + Gazetteer for + }}} + + You will now need to build a Gazetteer using the Geonames.org dataset. Instructions are provided below: + + {{{ + $ cd $HOME/src/lucene-geo-gazetteer + $ curl -O http://download.geonames.org/export/dump/allCountries.zip + $ unzip allCountries.zip + $ java -cp target/lucene-geo-gazetteer-<version>-jar-with-dependencies.jar edu.usc.ir.geo.gazetteer.GeoNameResolver -i geoIndex -b allCountries.txt + }}} + + You can verify that the Gazetteer build worked by searching e.g., for Pasadena, and/or Texas: + + {{{ + $ lucene-geo-gazetteer -s Pasadena Texas + }}} + + Note that we used the convenience script `lucene-geo-gazetteer` which assumes that you created an indexed named geoIndex in the $HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the pure Java command line to search. +
