Hi everyone,
 
My Solr JVM runs out of heap space quite frequently. I'm trying to
understand Solr/Lucene's memory usage so I can address the problem
correctly. Otherwise, I feel I'm taking random shots in the dark.
 
I've tried previous troubleshooting suggestions. Here's what I've done:
 
1) Increased Tomcat's JVM heap space, e.g.:
    JAVA_OPTS='-Xmx1244m -Xms1244m -server'; # frequent heap space problems
    JAVA_OPTS='-XX:+AggressiveHeap -server'; # runs out of heap space at
2.0g
    JAVA_OPTS='-Xmx3072m -Xms3072m -server'; # jvm quickly hits 2.9g on
'top'
 
Solr is the only webapp deployed on this Tomcat instance.
 
2) I use Solr collection/distribution to separate indexing and searching.
The indexer is stable now and memory problems only occur when searching on
the Solr slave.
 
3) In solrconfig.xml, I reduced mergeFactor and maxBufferedDocs by 50%:
    <mergeFactor>5</mergeFactor>
    <maxBufferedDocs>500</maxBufferedDocs>
 
This helped the indexing server but not the Solr slave.
 
4) In solrconfig.xml, I set filterCache, queryResultCache, and documentCache
to 0.
 
Now for my index details: 
- To facilitate highlighting, I currently store doc contents in the index,
so the index consumes 24GB on disk.
- numDocs : 4,953,736 
  maxDoc : 4,953,736 (just optimized)
- Term files:
   logs # du -ksh ../solr/data/index/*.t??
   5.9M    ../solr/data/index/_1kjb.tii
   429M    ../solr/data/index/_1kjb.tis
- I have 22 fields and yes, they currently have norms.

Other info that may be helpful:
- My Solr is from 2006-11-15. We have a few mods, including one extra
fieldCache that stores ~40 bytes/doc.
- Thread counts from solr/admin/threaddump.jsp:
  Java HotSpot(TM) 64-Bit Server VM 1.5.0_08-b03
  Thread Count: current=37 deamon=34 peak=37
 
My machine has Gentoo Linux and 4gb RAM. 'top' indicates the JVM reaches
2.9g RAM (3472m virtual memory) after 10-20 searches and ~20 mins of use. It
seems just a matter of time before more searches or a snapinstaller 'commit'
will make it run out of heap space again.
 
I have flexibility in the changes we can make. I.e., I can omit norms for
most fields, or I can stop storing the doc contents in the index. But before
embarking on a new strategy, I need some assurance that the strategy will
work (crazy, I know). For example, it doesn't seem that removing norms would
save a great deal (I calculate saving 1 byte per norm per field on 21 fields
is ~99MB).
 
So...how do I deduce what's taking up so much memory? Any suggestions would
be very helpful to me (and hopefully to others, too).
 
many thanks,
-Graham

Reply via email to