Tom,

As Shawn said, the Disk docValuesFormat format is what you're looking for
saving memory. The PerFieldDocValuesFormat supports specific docValue
formats for each field. The Lucene4* docValues formats are the default
compressed in-memory formats. The DirectDocValuesFormat is an uncompressed
in memory format, undocumented at the Solr level but likely available by
specifying "Direct" as the docValues format.


Joel Bernstein
Search Engineer at Heliosearch


On Fri, Jan 31, 2014 at 9:39 PM, Shawn Heisey <[email protected]> wrote:

> On 1/31/2014 2:15 PM, Tom Burton-West wrote:
> > When trying to facet on 200 million documents with a facet field that
> > has a very large number of unique values, we are running into OOM's.
> >  See this thread for background:
> >
> http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tt4100044.html
> >
> > Otis suggested that using DocValues might solve the memory issues.
> >
> > There seem to be several options for setting the DocValuesFormat.  Can
> > someone please clarify what the choices are for Solr 4.6 and what the
> > trade-offs are in terms of memory use and faceting performance?
>
> To minimize the amount of heap memory required, you should use the disk
> format.  There is one caveat, though -- only the default format is
> compatible when using an index built with one Solr version with a newer
> Solr version.  If you set it to disk, there's a very good chance that
> you'll need to wipe out your index and rebuild it from scratch when you
> upgrade Solr.  That is of course always recommended, but your index is
> not typical.
>
> I've heard that if you change the docValues format back to default and
> optimize your index, you can then upgrade safely, go back to disk, and
> optimize again, but I've never actually tried this.  I always rebuild
> from scratch when I upgrade.  I would imagine that until the optimize
> were to finish, anything that actually used the docValues wouldn't work
> right.
>
> Side anecdote: Because I maintain and update two completely separate
> production copies of my main Solr index (rather than use replication),
> upgrades are not terribly painful for us, even though a complete rebuild
> is always done whenever there's an upgrade or a significant config change.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to