[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

Bill Bell (JIRA) Mon, 14 Feb 2011 02:10:27 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994256#comment-12994256
 ]


Bill Bell commented on SOLR-2155:
---------------------------------

I did more research. You cannot get from doc to multiple values in the cache 
for a field. It does not exist for what I can see. The "docToTermOrd" property 
(type Direct8) is an array that is indexed by the document ID, and has one 
value (the term ord). It does not appear to be easy to get a list since there 
is one value. This was created to easily count the number of documents for 
facets (does it have 1 or more). I could do something like the following (but 
it would be really slow).

Document doc = searcher.doc(id, fields);

It would be better if you copied each lat long into the index with a prefix 
added to the sfield. Like "store_1", "store_2", "store_3", when you index the 
values. Then I can grab them easily. Of course you could also just sore them in 
one field like that I did but name it store_1 : "lat,lon|lat,lon". If we did 
this during indexing it would make it easier for people to use (not having to 
copy it) with bars. Asking for 2,3,4 term lists by document ID is probably 
slower than just doing the "|" separation. 

I keep going back to my patch, and I think it is still pretty good. I hope 
others have not went down this same path, since it was not fun.

Improvements potential:

1. Auto populate sfieldmulti when indexing geohash field into "|"
2. Multi-thread the brute force looking for lat longs
3. Use DistanceUtils for hsin
4. Remove split() to improve performance

Bill



> Geospatial search using geohash prefixes
> ----------------------------------------
>
>                 Key: SOLR-2155
>                 URL: https://issues.apache.org/jira/browse/SOLR-2155
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: David Smiley
>         Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
> GeoHashPrefixFilter.patch, SOLR.2155.p2.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on 
> documents that have a variable number of points.  This scenario occurs when 
> there is location extraction (i.e. via a "gazateer") occurring on free text.  
> None, one, or many geospatial locations might be extracted from any given 
> document and users want to limit their search results to those occurring in a 
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr 
> with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
> earth.  Each successive character added further subdivides the box into a 4x8 
> (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
> step in this scheme is figuring out which geohash grid squares cover the 
> user's search query.  I've added various extra methods to GeoHashUtils (and 
> added tests) to assist in this purpose.  The next step is an actual Lucene 
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
> TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
> matching geohash grid is found, the points therein are compared against the 
> user's query to see if it matches.  I created an abstraction GeoShape 
> extended by subclasses named PointDistance... and CartesianBox.... to support 
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

Reply via email to