Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "GeoTopicParser" page has been changed by MadhavSharan:
https://wiki.apache.org/tika/GeoTopicParser?action=diff&rev1=10&rev2=11

  usage: lucene-geo-gazetteer
   -b,--build <gazetteer file>           The Path to the Geonames
                                         allCountries.txt
+  -c,--count <number of results>        Number of best results to be
+                                        returned for one location
   -h,--help                             Print this message.
   -i,--index <directoryPath>            The path to the Lucene index
                                         directory to either create or read
+  -json,--json                          Formats output in well defined json
+                                        structure
   -s,--search <set of location names>   Location names to search the
                                         Gazetteer for
+  -server,--server                      Launches Geo Gazetteer Service
+ 
  }}}
  
  You will now need to build a Gazetteer using the Geonames.org dataset. 
Instructions are provided below. Note that you will need least 1.2 GB disk 
space for building Lucene Index for the Gazetteer.
@@ -44, +50 @@

  You can verify that the Gazetteer build worked by searching e.g., for 
Pasadena, and/or Texas:
  
  {{{
- $ lucene-geo-gazetteer -s Pasadena Texas
+ $ lucene-geo-gazetteer -s Pasadena Texas -json
+ 
{"Texas":[{"name":"Texas","countryCode":"US","admin1Code":"TX","admin2Code":"","latitude":31.25044,"longitude":-99.25061}],"Pasadena":[{"name":"Pasadena","countryCode":"US","admin1Code":"CA","admin2Code":"037","latitude":34.14778,"longitude":-118.14452}]}
- [
- {"Texas" : [
- "Texas",
- "-91.92139",
- "18.05333"
- ]},
- {"Pasadena" : [
- "Pasadena",
- "-74.06446",
- "4.6964"
- ]}
- ]
  }}}
  
+ Now you need to start REST service of lucene-geo-gazetteer. Tika uses this 
service internally
+ 
+ {{{
+ $ lucene-geo-gazetteer -server
+ }}}
+ 
+ You can verify that the REST API is responding by searching e.g., for 
Pasadena, and/or Texas:
+ 
+ {{{
+ $ curl "http://localhost:8765/api/search?s=Pasadena&s=Texas";
+ 
{"Texas":[{"name":"Texas","countryCode":"US","admin1Code":"TX","admin2Code":"","latitude":31.25044,"longitude":-99.25061}],"Pasadena":[{"name":"Pasadena","countryCode":"US","admin1Code":"CA","admin2Code":"037","latitude":34.14778,"longitude":-118.14452}]}
+ }}}
+ 
- Note that we used the convenience script `lucene-geo-gazetteer` which assumes 
that you created an indexed named geoIndex in the 
$HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the 
pure Java command line to search. The return from the Gazetteer is a JSON List 
of JSON Object structures in which the structure is a key->JSON List map. The 
key is the location name given and the List is a list of closest match (by Edit 
Distance) in the Gazetteer for that name, followed by Latitude, and Longitude 
of that location.
+ Note that we used the convenience script `lucene-geo-gazetteer` which assumes 
that you created an indexed named geoIndex in the 
$HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the 
pure Java command line to search. The return from the Gazetteer is a JSON List 
of Object structures in which the structure is a key->Object List map. The key 
is the location name given and the Object List is a list of most popular 
location objects in the Gazetteer for that name.
  
  == Installing and downloading an NER model ==
  

Reply via email to