There is a ton of misinformation in this thread. As of lucene 4.5, the default docvalues are disk-based (mostly, some small stuff in ram). You probably don't need to change anything from the defaults, unless:
if you want everything in RAM, use Memory. If you want to waste RAM, use Direct. If you have no RAM, use Disk. On Fri, Jan 31, 2014 at 4:15 PM, Tom Burton-West <[email protected]> wrote: > When trying to facet on 200 million documents with a facet field that has > a very large number of unique values, we are running into OOM's. See this > thread for background: > > http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tt4100044.html > > Otis suggested that using DocValues might solve the memory issues. > > There seem to be several options for setting the DocValuesFormat. Can > someone please clarify what the choices are for Solr 4.6 and what the > trade-offs are in terms of memory use and faceting performance? > > Without digging into the code and doing some performance testing its > difficult to understand the existing documentation. I'd really appreciate > hearing from people familiar with the issues before I create 3 different > indexes of 200 million documents to compare each of the options for > DocValuesFormat. > > Some details of the documentation are appended below. > > My apologies if this question should go to Lucene user instead of dev. If > it should, please let me know and also let me know how I can tell which > list to ask. > > > Tom Burton-West > > ------------------------------------------------------ > > The documentation on the Solr wiki seems to be for Solr 4.2 and seems to > contradict the cwiki reference guide: > > Cwiki ref guide: > https://cwiki.apache.org/confluence/display/solr/DocValues > > "The default implementation employs a mixture of loading some things into > memory and keeping some on disk. In some cases, however, you may choose to > either keep everything on disk or keep it in memory. You can do this by > defining docValuesFormat="Disk" or docValuesFormat="Memory" on the field > type. This example shows defining the format as "Disk" > > Solr Wiki: > http://wiki.apache.org/solr/DocValues > docValuesFormat="Lucene42": This is the default, which loads everything > into heap memory. > > docValuesFormat="Disk": This implementation has a different layout, to > try to keep most data on disk but with reasonable performance. > > docValuesFormat="SimpleText": Plain-text, slow, and not for production. > > On the other hand, the Lucene JavaDocs for Lucene 4.6 show both a > DiskDocValuesFormat > http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/diskdv/DiskDocValuesFormat.html > > and a DirectDocValuesFormat > http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/memory/DirectDocValuesFormat.html > > as well as Lucene4(0|2|5) and PerFieldDocValues Formats. > > http://lucene.apache.org/core/4_6_1/core/org/apache/lucene/codecs/DocValuesFormat.html?is-external=true >
