[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Raintung Li (JIRA) Sun, 05 Aug 2012 08:38:03 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raintung Li updated SOLR-3684:
------------------------------

    Attachment: patch.txt
    
> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads 
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>         Attachments: patch.txt
>
>
> Recently we test the Solr index throughput and performance, configure the 20 
> fields do test, the field type is normal text_general, start 1000 threads for 
> Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very 
> quickly. After check the root cause, find the java process always do the full 
> GC. 
> Check the heap dump, the main object is StandardTokenizer, it is be saved in 
> the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse 
> component strategy, that means one field has one own StandardTokenizer if it 
> use standard analyzer,  and standardtokenizer will occur 32KB memory because 
> of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
> and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only 
> analyses by one thread.  For one thread will parse the document’s field step 
> by step, so the same field type can use the same reused component. While 
> thread switches the same type’s field analyzes only reset the same component 
> input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification 
> patch for IndexSchema.java: 
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>         
>       private class SolrFieldReuseStrategy extends ReuseStrategy {
>             /**
>              * {@inheritDoc}
>              */
>             @SuppressWarnings("unchecked")
>             public TokenStreamComponents getReusableComponents(String 
> fieldName) {
>               Map<Analyzer, TokenStreamComponents> componentsPerField = 
> (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>               return componentsPerField != null ? 
> componentsPerField.get(analyzers.get(fieldName)) : null;
>             }
>             /**
>              * {@inheritDoc}
>              */
>             @SuppressWarnings("unchecked")
>             public void setReusableComponents(String fieldName, 
> TokenStreamComponents components) {
>               Map<Analyzer, TokenStreamComponents> componentsPerField = 
> (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>               if (componentsPerField == null) {
>                 componentsPerField = new HashMap<Analyzer, 
> TokenStreamComponents>();
>                 setStoredValue(componentsPerField);
>               }
>               componentsPerField.put(analyzers.get(fieldName), components);
>             }
>       }
>       
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components 
> per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>     
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : 
> getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, 
> TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : 
> getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Reply via email to