Tom, As Shawn said, the Disk docValuesFormat format is what you're looking for saving memory. The PerFieldDocValuesFormat supports specific docValue formats for each field. The Lucene4* docValues formats are the default compressed in-memory formats. The DirectDocValuesFormat is an uncompressed in memory format, undocumented at the Solr level but likely available by specifying "Direct" as the docValues format.
Joel Bernstein Search Engineer at Heliosearch On Fri, Jan 31, 2014 at 9:39 PM, Shawn Heisey <[email protected]> wrote: > On 1/31/2014 2:15 PM, Tom Burton-West wrote: > > When trying to facet on 200 million documents with a facet field that > > has a very large number of unique values, we are running into OOM's. > > See this thread for background: > > > http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tt4100044.html > > > > Otis suggested that using DocValues might solve the memory issues. > > > > There seem to be several options for setting the DocValuesFormat. Can > > someone please clarify what the choices are for Solr 4.6 and what the > > trade-offs are in terms of memory use and faceting performance? > > To minimize the amount of heap memory required, you should use the disk > format. There is one caveat, though -- only the default format is > compatible when using an index built with one Solr version with a newer > Solr version. If you set it to disk, there's a very good chance that > you'll need to wipe out your index and rebuild it from scratch when you > upgrade Solr. That is of course always recommended, but your index is > not typical. > > I've heard that if you change the docValues format back to default and > optimize your index, you can then upgrade safely, go back to disk, and > optimize again, but I've never actually tried this. I always rebuild > from scratch when I upgrade. I would imagine that until the optimize > were to finish, anything that actually used the docValues wouldn't work > right. > > Side anecdote: Because I maintain and update two completely separate > production copies of my main Solr index (rather than use replication), > upgrades are not terribly painful for us, even though a complete rebuild > is always done whenever there's an upgrade or a significant config change. > > Thanks, > Shawn > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
