When trying to facet on 200 million documents with a facet field that has a very large number of unique values, we are running into OOM's. See this thread for background: http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tt4100044.html
Otis suggested that using DocValues might solve the memory issues. There seem to be several options for setting the DocValuesFormat. Can someone please clarify what the choices are for Solr 4.6 and what the trade-offs are in terms of memory use and faceting performance? Without digging into the code and doing some performance testing its difficult to understand the existing documentation. I'd really appreciate hearing from people familiar with the issues before I create 3 different indexes of 200 million documents to compare each of the options for DocValuesFormat. Some details of the documentation are appended below. My apologies if this question should go to Lucene user instead of dev. If it should, please let me know and also let me know how I can tell which list to ask. Tom Burton-West ------------------------------------------------------ The documentation on the Solr wiki seems to be for Solr 4.2 and seems to contradict the cwiki reference guide: Cwiki ref guide: https://cwiki.apache.org/confluence/display/solr/DocValues "The default implementation employs a mixture of loading some things into memory and keeping some on disk. In some cases, however, you may choose to either keep everything on disk or keep it in memory. You can do this by defining docValuesFormat="Disk" or docValuesFormat="Memory" on the field type. This example shows defining the format as "Disk" Solr Wiki: http://wiki.apache.org/solr/DocValues docValuesFormat="Lucene42": This is the default, which loads everything into heap memory. docValuesFormat="Disk": This implementation has a different layout, to try to keep most data on disk but with reasonable performance. docValuesFormat="SimpleText": Plain-text, slow, and not for production. On the other hand, the Lucene JavaDocs for Lucene 4.6 show both a DiskDocValuesFormat http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/diskdv/DiskDocValuesFormat.html and a DirectDocValuesFormat http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/memory/DirectDocValuesFormat.html as well as Lucene4(0|2|5) and PerFieldDocValues Formats. http://lucene.apache.org/core/4_6_1/core/org/apache/lucene/codecs/DocValuesFormat.html?is-external=true
