Hello, Recently we have introduced distance searching/sorting into the existing Lucene index, using the Spatial contrib for Lucene 2.9.4. There are 100K+ documents into the index where only 20K docs had latitude/longitude and _tier_* fields. Spatial queries ran quite OK.
After enriching the index with geo coordinates for most of the documents, all queries using spatial distance filter + sorting started to run forever. The details about the implementation are below. Do you have any idea what could cause this problem? Environment Details ------------------ Lucene 2.9 Java 1.6.0_14 JAVA_OPTS=-Xms8000M -Xmx8000M -server -XX:-UseParallelOldGC -XX:+PrintCommandLineFlags -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+DisableExplicitGC -Xloggc:gc.log CentOS release 5.5 (Final) 8 cores server (physical machine) 18GB RAM RAID5 HDD (on this machine only Apache Web Server is running at the moment) Implementation Details ------------------ Implementation is based on the blog http://develop.nydi.ch/2010/10/lucene-spatial-example/. During the execution of spatial query the processor usage is raised to the max and runs like that for hours. Thread dump shows next: "searchers-thread-63" prio=10 tid=0x00000000488e4800 nid=0x3dab runnable [0x0000000046789000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(HashMap.java:374) at org.apache.lucene.spatial.tier.LatLongDistanceFilter$1.match(LatLongDistanceFilter.java:97) at org.apache.lucene.search.FilteredDocIdSet$1.match(FilteredDocIdSet.java:73) at org.apache.lucene.search.FilteredDocIdSetIterator.advance(FilteredDocIdSetIterator.java:87) at org.apache.lucene.util.OpenBitSetDISI.inPlaceAnd(OpenBitSetDISI.java:66) at org.apache.lucene.misc.ChainedFilter.doChain(ChainedFilter.java:253) at org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:177) at org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:104) at org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:277) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:258) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181) at org.apache.lucene.search.Searcher.search(Searcher.java:90) at com.yc.cyclone.connector.lucene.NewLuceneConnector.executeSearch(NewLuceneConnector.java:730) at com.yc.cyclone.connector.lucene.NewLuceneConnector.access$000(NewLuceneConnector.java:33) at com.yc.cyclone.connector.lucene.NewLuceneConnector$2.run(NewLuceneConnector.java:884) at javolution.context.ConcurrentContext$Default.executeAction(ConcurrentContext.java:358) at javolution.context.ConcurrentContext.execute(ConcurrentContext.java:271) at com.yc.cyclone.connector.lucene.NewLuceneConnector.newSearchByGroupsImpl(NewLuceneConnector.java:879) at com.yc.cyclone.connector.lucene.NewLuceneConnector.newSearchByGroupsImpl(NewLuceneConnector.java:782) at com.yc.cyclone.isystem.search.grouping.TopicGroupingSearch$1.call(TopicGroupingSearch.java:667) at com.yc.cyclone.isystem.search.grouping.TopicGroupingSearch$1.call(TopicGroupingSearch.java:662) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) at com.yc.cyclone.services.concurrency.WorkerThread.run(WorkerThread.java:49) It's interesting, though, that even the processor was 100% used all the time, other (non-spatial) searches and indexing tasks were processed by Lucene without any problem and without noticable performance decrease. We execute multiple queries in parallel (one search parameter differs in those queries), which reuse the same filter, in this case this is: new ChainedFilter( new Filter[] {nonSpatialQueryFilter, distanceQueryBuilder.getFilter()}, ChainedFilter.AND); For sorting is used: new DistanceFieldComparatorSource(distanceQueryBuilder..getDistanceFilter()); Here is one entry from the index (spatial fields): _tier_10 _tier_11 _tier_12 _tier_13 _tier_14 _tier_15 _tier_7 _tier_8 _tier_9 lat lng 0.0 1.0001 2.0003 4.0006 9.00013 18.00027 0.0 0.0 0.0 47.61242 8.54002 Note that those fields are indexed as numeric fields, I've used NumericUtils.prefixCodedToDouble(field.stringValue()) to print those data. There are also documents which do not have those fields indexed. Thank you. Best Regards, Drazen -- View this message in context: http://lucene.472066.n3.nabble.com/SPATIAL-Spatial-search-runs-forever-tp3258018p3258018.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org