There is a ton of misinformation in this thread.

As of lucene 4.5, the default docvalues are disk-based (mostly, some small
stuff in ram). You probably don't need to change anything from the
defaults, unless:

if you want everything in RAM, use Memory.
If you want to waste RAM, use Direct.
If you have no RAM, use Disk.


On Fri, Jan 31, 2014 at 4:15 PM, Tom Burton-West <[email protected]> wrote:

> When trying to facet on 200 million documents with a facet field that has
> a very large number of unique values, we are running into OOM's.  See this
> thread for background:
>
> http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tt4100044.html
>
> Otis suggested that using DocValues might solve the memory issues.
>
> There seem to be several options for setting the DocValuesFormat.  Can
> someone please clarify what the choices are for Solr 4.6 and what the
> trade-offs are in terms of memory use and faceting performance?
>
> Without digging into the code and doing some performance testing its
> difficult to understand the existing documentation.   I'd really appreciate
> hearing from people familiar with the issues before I create 3 different
> indexes of 200 million documents to compare each of the options for
> DocValuesFormat.
>
> Some details of the documentation are appended below.
>
> My apologies if this question should go to Lucene user instead of dev.  If
> it should, please let me know and also let me know how I can tell which
> list to ask.
>
>
> Tom Burton-West
>
> ------------------------------------------------------
>
>   The documentation on the Solr wiki seems to be for Solr 4.2 and seems to
> contradict the cwiki reference guide:
>
> Cwiki ref guide:
> https://cwiki.apache.org/confluence/display/solr/DocValues
>
> "The default implementation employs a mixture of loading some things into
> memory and keeping some on disk. In some cases, however, you may choose to
> either keep everything on disk or keep it in memory. You can do this by
> defining docValuesFormat="Disk" or docValuesFormat="Memory" on the field
> type. This example shows defining the format as "Disk"
>
> Solr Wiki:
> http://wiki.apache.org/solr/DocValues
> docValuesFormat="Lucene42": This is the default, which loads everything
> into heap memory.
>
> docValuesFormat="Disk": This implementation has a different layout, to
> try to keep most data on disk but with reasonable performance.
>
> docValuesFormat="SimpleText": Plain-text, slow, and not for production.
>
> On the other hand, the Lucene JavaDocs for Lucene 4.6 show both a
> DiskDocValuesFormat
> http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/diskdv/DiskDocValuesFormat.html
>
>  and a DirectDocValuesFormat
> http://lucene.apache.org/core/4_6_1/codecs/org/apache/lucene/codecs/memory/DirectDocValuesFormat.html
>
> as well as Lucene4(0|2|5) and PerFieldDocValues Formats.
>
> http://lucene.apache.org/core/4_6_1/core/org/apache/lucene/codecs/DocValuesFormat.html?is-external=true
>

Reply via email to