Hej Paul, I have implemented the DistanceComparatorSource > example from Lucene In Action (my Bible) and it works > great. We are now in the situation where we have > nearly a million documents in our index and the > performance of this implementation has degraded. >
I have had the same problem with DistanceComparatorSource from Lucene In Action. After doing some profiling found that if you do not implement equals and hashcode in at least class that implments SortComparatorSort a memory leak is created. The sorting api in lucene keeps a cache of SortComparators in org.apache.lucene.search.FieldCacheImpl. This cache is based on 3 things 1) IndexReader 2) Field you are sorting 3) Compartor you are using If your compartor does not implement equals and hashcode your are getting penalized twice as the int[] you are using is being created *everytime* the sort is used and the internal cache in FieldCacheImpl grows overtime. We had a similar degradation when using lucene until we implemented a equals and hascode in out SortComparatorSort. We were using lucene-1.4.3 at the time and tried different combinations of versions of java(1.4.2 compared to 1.5) and lucene. In our environment we found that the best increase was by upgrading to lucene-1.9.1. A little more info can be found here http://www.lucenebook.com/blog/errata/ /Brian