Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "GeoTopicParser" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/GeoTopicParser?action=diff&rev1=1&rev2=2

  
  GeoTopicParser uses [[http://lucene.apache.org/|Apache Lucene]] and 
[[http://opennlp.apache.org/|Apache OpenNLP]] to provide its capabilities.
  
+ = Installing the Lucene Gazetteer =
+ 
+ First you will need to download the 
[[http://github.com/chrismattmann/lucene-geo-gazetteer|Lucene Geo Gazetteer]] 
project and to install it. You can do so by:
+ 
+ {{{
+ $ cd $HOME/src
+ $ git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git
+ $ cd lucene-geo-gazetteer
+ $ mvn install
+ $ add $HOME/src/lucene-geo-gazetteer/src/main/bin to your PATH environment 
variable
+ }}}
+ 
+ Once done, you can verify that the installation worked by running the 
following command:
+ 
+ {{{
+ $ lucene-geo-gazetteer --help
+ usage: lucene-geo-gazetteer
+  -b,--build <gazetteer file>           The Path to the Geonames
+                                        allCountries.txt
+  -h,--help                             Print this message.
+  -i,--index <directoryPath>            The path to the Lucene index
+                                        directory to either create or read
+  -s,--search <set of location names>   Location names to search the
+                                        Gazetteer for
+ }}}
+ 
+ You will now need to build a Gazetteer using the Geonames.org dataset. 
Instructions are provided below:
+ 
+ {{{
+ $ cd $HOME/src/lucene-geo-gazetteer
+ $ curl -O http://download.geonames.org/export/dump/allCountries.zip
+ $ unzip allCountries.zip
+ $ java -cp target/lucene-geo-gazetteer-<version>-jar-with-dependencies.jar 
edu.usc.ir.geo.gazetteer.GeoNameResolver -i geoIndex -b allCountries.txt
+ }}}
+ 
+ You can verify that the Gazetteer build worked by searching e.g., for 
Pasadena, and/or Texas:
+ 
+ {{{
+ $ lucene-geo-gazetteer -s Pasadena Texas
+ }}}
+ 
+ Note that we used the convenience script `lucene-geo-gazetteer` which assumes 
that you created an indexed named geoIndex in the 
$HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the 
pure Java command line to search.
+ 

Reply via email to