On 1/31/2014 2:15 PM, Tom Burton-West wrote: > When trying to facet on 200 million documents with a facet field that > has a very large number of unique values, we are running into OOM's. > See this thread for background: > http://lucene.472066.n3.nabble.com/Estimating-peak-memory-use-for-UnInvertedField-faceting-tt4100044.html > > Otis suggested that using DocValues might solve the memory issues. > > There seem to be several options for setting the DocValuesFormat. Can > someone please clarify what the choices are for Solr 4.6 and what the > trade-offs are in terms of memory use and faceting performance?
To minimize the amount of heap memory required, you should use the disk format. There is one caveat, though -- only the default format is compatible when using an index built with one Solr version with a newer Solr version. If you set it to disk, there's a very good chance that you'll need to wipe out your index and rebuild it from scratch when you upgrade Solr. That is of course always recommended, but your index is not typical. I've heard that if you change the docValues format back to default and optimize your index, you can then upgrade safely, go back to disk, and optimize again, but I've never actually tried this. I always rebuild from scratch when I upgrade. I would imagine that until the optimize were to finish, anything that actually used the docValues wouldn't work right. Side anecdote: Because I maintain and update two completely separate production copies of my main Solr index (rather than use replication), upgrades are not terribly painful for us, even though a complete rebuild is always done whenever there's an upgrade or a significant config change. Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
