Is upgrading to Lucene 6 and using points rather than terms an option? Points typically have lower memory usage (see GeoPoint which is based on terms vs LatLonPoint which is based on points at http://people.apache.org/~mikemccand/geobench.html#reader-heap).
Le jeu. 18 mai 2017 à 02:35, Tom Hirschfeld <tomhirschf...@gmail.com> a écrit : > Hey! > > I am working on a lucene based service for reverse geocoding. We have a > large index with lots of unique terms (550 million) and it appears that > we're running into issue with memory on our leaf servers as the term > dictionary for the entire index is being loaded into heap space. If we > allocate > 65g heap space, our queries return relatively quickly (10s -100s > of ms), but if we drop below ~65g heap space on the leaf nodes, query time > drops dramatically, quickly hitting 20+ seconds (our test harness drops at > 20s). > > I did some research, and found in past versions of lucene, one could split > the loading of the terms dictionary using the 'termInfosIndexDivisor' > option in the directoryReader class. That option was deprecated in lucene > 5.0.0 > <https://abi-laboratory.pro/java/tracker/changelog/lucene/5.0.0/log.html> > in > favor of using codecs to achieve similar functionality. Looking at the > available experimental codecs. I see the BlockTreeTermsWriter > < > https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.html#BlockTreeTermsWriter(org.apache.lucene.index.SegmentWriteState > , > org.apache.lucene.codecs.PostingsWriterBase, int, int)> that seems like it > could be used for a similar purpose, breaking down the term dictionary so > that we don't load the whole thing into heap space. > > Has anyone run into this problem before and found an effective solution? > Does changing the codec used seem appropriate for this issue? If so, how do > I got about loading an alternative codec and configuring it to my needs? > I'm having trouble finding docs/examples of how this is used in the real > world so even if you point me to a repo or docs somewhere I'd appreciate > it. > Thanks! > > Best, > Tom Hirschfeld >