We've written our own custom sorter to be able to sort on the latitude and longitude fields from the results. We have an index that is about 18million records and 12GB on disk in size. We allocated about 3GB of heap to the index and with about 1 request to the index every 2 or 3 seconds we would run out of memory about once every 1.5 hours. We modified our custom sort comparators to implement the equals and hashcode methods and used a WeakHashMap to cache the doc ids to and their lat/lon values. We ran some tests and it started to reuse those comparators and now it will go for maybe 6 to 9 hours before running out of memory, however, it's still running out of memory.
The only other object that isn't in our sorter, that isn't in the WeakHashMap is an instance of the index reader. We use that reader as instance variable in order to get the latitude and longitude values to sort on. Could having a reference to the index reader be causing the memory leak? We were considering making the instance of the index reader a WeakReference. When we did some profiling we noticed that before it runs out of memory and the org.apache.lucene.search.FieldSortedHitQueue class was taking up 331MB and the org.apache.lucene.index.IndexReader[23] class was taking up 111MB of memory. The [23] means it there were 23 instances.