When trying to facet on 200 million documents with a facet field that has a
very large number of unique values, we are running into OOM's.  See this
thread for background:
http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tt4100044.html

Otis suggested that using DocValues might solve the memory issues.

There seem to be several options for setting the DocValuesFormat.  Can
someone please clarify what the choices are for Solr 4.6 and what the
trade-offs are in terms of memory use and faceting performance?

Without digging into the code and doing some performance testing its
difficult to understand the existing documentation.   I'd really appreciate
hearing from people familiar with the issues before I create 3 different
indexes of 200 million documents to compare each of the options for
DocValuesFormat.

Some details of the documentation are appended below.

My apologies if this question should go to Lucene user instead of dev.  If
it should, please let me know and also let me know how I can tell which
list to ask.


Tom Burton-West

------------------------------------------------------

  The documentation on the Solr wiki seems to be for Solr 4.2 and seems to
contradict the cwiki reference guide:

Cwiki ref guide:
https://cwiki.apache.org/confluence/display/solr/DocValues

"The default implementation employs a mixture of loading some things into
memory and keeping some on disk. In some cases, however, you may choose to
either keep everything on disk or keep it in memory. You can do this by
defining docValuesFormat="Disk" or docValuesFormat="Memory" on the field
type. This example shows defining the format as "Disk"

Solr Wiki:
http://wiki.apache.org/solr/DocValues
docValuesFormat="Lucene42": This is the default, which loads everything
into heap memory.

docValuesFormat="Disk": This implementation has a different layout, to try
to keep most data on disk but with reasonable performance.

docValuesFormat="SimpleText": Plain-text, slow, and not for production.

On the other hand, the Lucene JavaDocs for Lucene 4.6 show both a
DiskDocValuesFormat
http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/diskdv/DiskDocValuesFormat.html

 and a DirectDocValuesFormat
http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/memory/DirectDocValuesFormat.html

as well as Lucene4(0|2|5) and PerFieldDocValues Formats.
http://lucene.apache.org/core/4_6_1/core/org/apache/lucene/codecs/DocValuesFormat.html?is-external=true

Reply via email to