Re: Why do SolrIndex/QueryAnalyzers use PER_FIELD_REUSE_STRATEGY

Shai Erera Thu, 23 Jul 2015 23:41:14 -0700

Thanks Shalin, but I reviewed the code in trunk, and it still passes
PER_FIELD. I can double check but I'm pretty sure that's what I saw.


Shai
On Jul 24, 2015 7:59 AM, "Shalin Shekhar Mangar" <[email protected]>
wrote:

> Uwe fixed this in 4.10 with LUCENE-5803. Now we use
> GLOBAL_REUSE_STRATEGY on a per-field type basis. One of my todos is to
> create field types per node instead of per core for more savings.
>
> On Fri, Jul 24, 2015 at 3:24 AM, Shai Erera <[email protected]> wrote:
> > Hi
> >
> > I am helping to debug a Solr (4.7) deployment which shows >5.5GB of heap
> > usage by IndexSchema. This Solr in particular has one collection with 64
> > shards (2 replicas, but 64 cores on one node). The schema has ~120
> fields,
> > ~20 of them are of the same field type (text_general) and is serving
> around
> > 700 concurrent users (peak), with a thread pool limit of 1000.
> >
> > Reducing the thread-pool size is something they've tried, but the load is
> > high and the server keeps up fine with the load, and a thread pool that
> > size.
> >
> > What surprised me is that they report obscene numbers they see in the
> heap:
> > 680K (!!) objects of TokenStreamComponents, each holds a buffer of 8KB
> > coming from StandardTokenizerImpl.zzBuffer. That surprised me because I
> > thought that a TokenStreamComponents can be (and is) reused for all
> fields
> > in a document. And so even if we hold a ThreadLocal per
> > TokenStreamComponents, we should see 1000 of them at the most - per
> > Analyzer. And as I said, the analyzed fields are of type text_general,
> and
> > the rest of the fields are numeric, DV, String, Bool etc. (aka
> > not-analyzed).
> >
> > Reviewing IndexSchema it holds two instances: SolrIndexAnalyzer (extends
> > DelegatingAnalyzerWrapper) and SolrQueryAnalyzer (extends
> > SolrIndexAnalyzer). SolrIndexAnalyzer's constructor sets ReuseStrategy ==
> > PER_FIELD_REUSE_STRATEGY. This might explain the 680K objects in the
> heap:
> >
> > 64 (cores) x 700 (threads) x 20 (fields) = 940K (more than 680K, but
> could
> > be they served less than 700 users when the heap dump was taken).
> >
> > And if each such instance holds a zzBuffer of size 8KB, this amounts to
> >7GB
> > of heap space!
> >
> > Per Analyzer's constructor (which takes ReuseStrategy):
> >
> >   /**
> >    * Expert: create a new Analyzer with a custom {@link ReuseStrategy}.
> >    * <p>
> >    * NOTE: if you just want to reuse on a per-field basis, it's easier to
> >    * use a subclass of {@link AnalyzerWrapper} such as
> >    * <a
> >
> href="{@docRoot}/../analyzers-common/org/apache/lucene/analysis/miscellaneous/PerFieldAnalyzerWrapper.html">
> >    * PerFieldAnalyerWrapper</a> instead.
> >    */
> >
> > However, AnalyzerWrapper's documentation somewhat contradicts it (I
> think):
> >
> >   /**
> >    * Creates a new AnalyzerWrapper with the given reuse strategy.
> >    * <p>If you want to wrap a single delegate Analyzer you can probably
> >    * reuse its strategy when instantiating this subclass:
> >    * {@code super(delegate.getReuseStrategy());}.
> >    * <p>If you choose different analyzers per field, use
> >    * {@link #PER_FIELD_REUSE_STRATEGY}.
> >    * @see #getReuseStrategy()
> >    */
> >
> > Maybe it is correct for AW, but not for DelegatingAW?
> >
> > From what I understand, we should be OK setting a GLOBAL_REUSE_STRATEGY
> > since SolrIndexAnalyzer returns different Analyzers for different fields
> > (per their field-type). But all fields that share the same Analyzer
> instance
> > should be safe reusing its TokenStreamComponents, since we never process
> > fields in parallel?
> >
> > To that extent, I also feel like PerFieldAnalyzerWrapper shouldn't pass
> > PER_FIELD_REUSE_STRATEGY (since it too returns different Analyzer
> instances
> > for different fields), but it's the only piece of the puzzle that
> confuses
> > me, since I trust whoever wrote this class to understand this stuff
> better
> > than I do ...
> >
> > What do you think?
> >
> > Shai
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Why do SolrIndex/QueryAnalyzers use PER_FIELD_REUSE_STRATEGY

Reply via email to