Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "GeoTopicParser" page has been changed by MadhavSharan: https://wiki.apache.org/tika/GeoTopicParser?action=diff&rev1=10&rev2=11 usage: lucene-geo-gazetteer -b,--build <gazetteer file> The Path to the Geonames allCountries.txt + -c,--count <number of results> Number of best results to be + returned for one location -h,--help Print this message. -i,--index <directoryPath> The path to the Lucene index directory to either create or read + -json,--json Formats output in well defined json + structure -s,--search <set of location names> Location names to search the Gazetteer for + -server,--server Launches Geo Gazetteer Service + }}} You will now need to build a Gazetteer using the Geonames.org dataset. Instructions are provided below. Note that you will need least 1.2 GB disk space for building Lucene Index for the Gazetteer. @@ -44, +50 @@ You can verify that the Gazetteer build worked by searching e.g., for Pasadena, and/or Texas: {{{ - $ lucene-geo-gazetteer -s Pasadena Texas + $ lucene-geo-gazetteer -s Pasadena Texas -json + {"Texas":[{"name":"Texas","countryCode":"US","admin1Code":"TX","admin2Code":"","latitude":31.25044,"longitude":-99.25061}],"Pasadena":[{"name":"Pasadena","countryCode":"US","admin1Code":"CA","admin2Code":"037","latitude":34.14778,"longitude":-118.14452}]} - [ - {"Texas" : [ - "Texas", - "-91.92139", - "18.05333" - ]}, - {"Pasadena" : [ - "Pasadena", - "-74.06446", - "4.6964" - ]} - ] }}} + Now you need to start REST service of lucene-geo-gazetteer. Tika uses this service internally + + {{{ + $ lucene-geo-gazetteer -server + }}} + + You can verify that the REST API is responding by searching e.g., for Pasadena, and/or Texas: + + {{{ + $ curl "http://localhost:8765/api/search?s=Pasadena&s=Texas" + {"Texas":[{"name":"Texas","countryCode":"US","admin1Code":"TX","admin2Code":"","latitude":31.25044,"longitude":-99.25061}],"Pasadena":[{"name":"Pasadena","countryCode":"US","admin1Code":"CA","admin2Code":"037","latitude":34.14778,"longitude":-118.14452}]} + }}} + - Note that we used the convenience script `lucene-geo-gazetteer` which assumes that you created an indexed named geoIndex in the $HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the pure Java command line to search. The return from the Gazetteer is a JSON List of JSON Object structures in which the structure is a key->JSON List map. The key is the location name given and the List is a list of closest match (by Edit Distance) in the Gazetteer for that name, followed by Latitude, and Longitude of that location. + Note that we used the convenience script `lucene-geo-gazetteer` which assumes that you created an indexed named geoIndex in the $HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the pure Java command line to search. The return from the Gazetteer is a JSON List of Object structures in which the structure is a key->Object List map. The key is the location name given and the Object List is a list of most popular location objects in the Gazetteer for that name. == Installing and downloading an NER model ==
