Srikanth, Have you tried to do a basic profiling or sampling? Just take a few thread dumps by jstack. If the code is so greedy for CPU, you'll have it in a stack.
Regards On Thu, Dec 8, 2011 at 8:57 PM, Srikanth Kallurkar (Commented) (JIRA) < j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165326#comment-13165326] > > Srikanth Kallurkar commented on SOLR-2155: > ------------------------------------------ > > In my use case, I have a large number of lat-lons for each document - on > the order of about 2K lat-lon pairs. Since the time we started using > geohash prefix filter, the time to index has significantly degraded - by > about 2-3 times. Are there any suggestions for speeding up the indexing > process. I was trying to read the comments here, but am not sure if any > index time caching mechanism is used (or could be used) to lookup geohashes. > > Thanks, > Srikanth > > > > > Geospatial search using geohash prefixes > > ---------------------------------------- > > > > Key: SOLR-2155 > > URL: https://issues.apache.org/jira/browse/SOLR-2155 > > Project: Solr > > Issue Type: Improvement > > Reporter: David Smiley > > Attachments: GeoHashPrefixFilter.patch, > GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, > SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, > SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, > Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch > > > > > > There currently isn't a solution in Solr for doing geospatial filtering > on documents that have a variable number of points. This scenario occurs > when there is location extraction (i.e. via a "gazateer") occurring on free > text. None, one, or many geospatial locations might be extracted from any > given document and users want to limit their search results to those > occurring in a user-specified area. > > I've implemented this by furthering the GeoHash based work in > Lucene/Solr with a geohash prefix based filter. A geohash refers to a > lat-lon box on the earth. Each successive character added further > subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of > the geohash) grid. The first step in this scheme is figuring out which > geohash grid squares cover the user's search query. I've added various > extra methods to GeoHashUtils (and added tests) to assist in this purpose. > The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses > these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares > in the index. Once a matching geohash grid is found, the points therein > are compared against the user's query to see if it matches. I created an > abstraction GeoShape extended by subclasses named PointDistance... and > CartesianBox.... to support different queried shapes so that the filter > need not care about these details. > > This work was presented at LuceneRevolution in Boston on October 8th. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev <http://www.griddynamics.com> <mkhlud...@griddynamics.com>