Thanks Shalin, but I reviewed the code in trunk, and it still passes PER_FIELD. I can double check but I'm pretty sure that's what I saw.
Shai On Jul 24, 2015 7:59 AM, "Shalin Shekhar Mangar" <[email protected]> wrote: > Uwe fixed this in 4.10 with LUCENE-5803. Now we use > GLOBAL_REUSE_STRATEGY on a per-field type basis. One of my todos is to > create field types per node instead of per core for more savings. > > On Fri, Jul 24, 2015 at 3:24 AM, Shai Erera <[email protected]> wrote: > > Hi > > > > I am helping to debug a Solr (4.7) deployment which shows >5.5GB of heap > > usage by IndexSchema. This Solr in particular has one collection with 64 > > shards (2 replicas, but 64 cores on one node). The schema has ~120 > fields, > > ~20 of them are of the same field type (text_general) and is serving > around > > 700 concurrent users (peak), with a thread pool limit of 1000. > > > > Reducing the thread-pool size is something they've tried, but the load is > > high and the server keeps up fine with the load, and a thread pool that > > size. > > > > What surprised me is that they report obscene numbers they see in the > heap: > > 680K (!!) objects of TokenStreamComponents, each holds a buffer of 8KB > > coming from StandardTokenizerImpl.zzBuffer. That surprised me because I > > thought that a TokenStreamComponents can be (and is) reused for all > fields > > in a document. And so even if we hold a ThreadLocal per > > TokenStreamComponents, we should see 1000 of them at the most - per > > Analyzer. And as I said, the analyzed fields are of type text_general, > and > > the rest of the fields are numeric, DV, String, Bool etc. (aka > > not-analyzed). > > > > Reviewing IndexSchema it holds two instances: SolrIndexAnalyzer (extends > > DelegatingAnalyzerWrapper) and SolrQueryAnalyzer (extends > > SolrIndexAnalyzer). SolrIndexAnalyzer's constructor sets ReuseStrategy == > > PER_FIELD_REUSE_STRATEGY. This might explain the 680K objects in the > heap: > > > > 64 (cores) x 700 (threads) x 20 (fields) = 940K (more than 680K, but > could > > be they served less than 700 users when the heap dump was taken). > > > > And if each such instance holds a zzBuffer of size 8KB, this amounts to > >7GB > > of heap space! > > > > Per Analyzer's constructor (which takes ReuseStrategy): > > > > /** > > * Expert: create a new Analyzer with a custom {@link ReuseStrategy}. > > * <p> > > * NOTE: if you just want to reuse on a per-field basis, it's easier to > > * use a subclass of {@link AnalyzerWrapper} such as > > * <a > > > href="{@docRoot}/../analyzers-common/org/apache/lucene/analysis/miscellaneous/PerFieldAnalyzerWrapper.html"> > > * PerFieldAnalyerWrapper</a> instead. > > */ > > > > However, AnalyzerWrapper's documentation somewhat contradicts it (I > think): > > > > /** > > * Creates a new AnalyzerWrapper with the given reuse strategy. > > * <p>If you want to wrap a single delegate Analyzer you can probably > > * reuse its strategy when instantiating this subclass: > > * {@code super(delegate.getReuseStrategy());}. > > * <p>If you choose different analyzers per field, use > > * {@link #PER_FIELD_REUSE_STRATEGY}. > > * @see #getReuseStrategy() > > */ > > > > Maybe it is correct for AW, but not for DelegatingAW? > > > > From what I understand, we should be OK setting a GLOBAL_REUSE_STRATEGY > > since SolrIndexAnalyzer returns different Analyzers for different fields > > (per their field-type). But all fields that share the same Analyzer > instance > > should be safe reusing its TokenStreamComponents, since we never process > > fields in parallel? > > > > To that extent, I also feel like PerFieldAnalyzerWrapper shouldn't pass > > PER_FIELD_REUSE_STRATEGY (since it too returns different Analyzer > instances > > for different fields), but it's the only piece of the puzzle that > confuses > > me, since I trust whoever wrote this class to understand this stuff > better > > than I do ... > > > > What do you think? > > > > Shai > > > > -- > Regards, > Shalin Shekhar Mangar. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
