Re: Geospatial search in Lucene/Solr

Smiley, David W. Tue, 28 Dec 2010 08:59:39 -0800

Thanks for letting me know about this Rob.  I think geonames is much simpler 
(and much less data) to work with than wikipedia.  It's plain tab-delimited and 
I like that it includes the population.  I'll press forward with my benchmark 
module based patch.  I can relatively easily switch between the lat-lon type 
and my geohash type since they both conform to the SpatialQueriable interface, 
and so consequently I don't need two complete Lucene checkouts.  I had to add 
Solr & spatial as dependencies to the benchmark module but it's worth it to me.


~ David

On Dec 28, 2010, at 11:18 AM, Robert Muir wrote:

> On Tue, Dec 28, 2010 at 10:47 AM, Smiley, David W. <[email protected]> wrote:
>> Presently, I’m working on Lucene’s benchmark contrib module to evaluate the
>> performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon
>> range queries), and then I’ll work on a more efficient probably non-geohash
>> implementation but based on the same underlying concept of a hierarchical
>> grid.  I’m using the geonames.org data set.  Unfortunately, the benchmark
>> code seems very oriented to a generic title-body document whereas I’m
>> looking to create lat-lon pairs… and furthermore to create documents
>> containing multiple lat-lon pairs, and even furthermore a query generator
>> that generates random box queries centered on a random location from the
>> data set.  I seem to be stretching the benchmark framework beyond the
>> use-case it was designed for and so perhaps it won’t be committable but at
>> least I’ll have a patch for other geospatial birds-of-a-feather like you to
>> use.
>> 
>> Stretch away.  The Title/Body orientation is just a relic of what we have
>> done in the past, it doesn't have to stay that way.
> 
> just for reference, a couple of us are using a python front-end to
> contrib/benchmark that Mike developed:
> 
> http://code.google.com/p/luceneutil/
> 
> This is nice as its designed for you to just declare 'competitors' (2
> checkouts of solrcene), and then you run the python script and it
> gives you the relative comparison... because they are 2 different
> checkouts its simple to compare different approaches, and each
> checkout can run with a different index (e.g. different codecs or test
> index format changes).
> 
> I thought it might be interesting to you, because there's a variety of
> queries tested here like numeric range, sorting, primary-key lookup,
> span queries etc beyond the "standard" set of queries. The framework
> also ensures that you are bringing back the same results in the same
> order, runs multiple iterations (including iterations in new JVMs),
> makes it easy to test optimized, optimized with deletions,
> multi-segment, multi-segment with deletions, and can output to txt,
> html, jira format for convenience.
> 
> currently we are generally testing with a line file format from
> wikipedia, but besides geonames i wanted to point out that wikipedia
> does include lat/long information for many articles (this is a major
> source for much of geonames place data!).
> 
> it would definitely be cool if we could test spatial queries with this
> as well... e.g by parsing out the lat/long from the wikipedia XML and
> adding to the line files, and adding some spatial queries to the
> default list of queries being tested.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Geospatial search in Lucene/Solr

Reply via email to