Thanks a lot!
>"large text fields"
What is a good limit (in characters) to switch from StringField to TextField? 
Do <Langugae>Analyzers (e.g. GermanAnalyzer)  help a lot in reducing the size 
of an Index?

> Add XXXDocValuesField instead of e.g. StringField.
Does this apply only for StringFields? Or for TextFields too?

> Upgrade to the upcoming Lucene 4.9
we have not yet transitionen to Java 7/8 ... hopefully soon ;)

> and take a heap dump and see what's using RAM
Find attached a snippet from MemoryAnalyzer
Class Name                                                                      
                                                                   | Shallow 
Heap | Retained Heap | Percentage
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
org.apache.lucene.index.StandardDirectoryReader @ 0x783932460                   
                                                                   |           
72 |    59'255'872 |      3.04%
|- org.apache.lucene.index.SegmentReader[24] @ 0x794089ee0                      
                                                                   |          
112 |    59'190'960 |      3.03%
|  |- org.apache.lucene.index.SegmentReader @ 0x788820f40                       
                                                                   |           
72 |    16'905'072 |      0.87%
|  |  |- org.apache.lucene.index.SegmentCoreReaders @ 0x7910cacc8               
                                                                   |           
56 |    16'895'576 |      0.87%
|  |  |  |- 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader @ 
0x780661c50                                                    |           24 | 
   16'864'864 |      0.86%
|  |  |  |  |- org.apache.lucene.codecs.BlockTreeTermsReader @ 0x7910cae50      
                                                                   |           
56 |    16'864'240 |      0.86%
|  |  |  |  |  |- java.util.TreeMap @ 0x783902738                               
                                                                   |           
48 |    16'858'472 |      0.86%
|  |  |  |  |  |  '- java.util.TreeMap$Entry @ 0x77ec5f9f8                      
                                                                   |           
40 |    16'858'424 |      0.86%
|  |  |  |  |  |     |- java.util.TreeMap$Entry @ 0x77ec5fa20                   
                                                                   |           
40 |    10'895'656 |      0.56%
|  |  |  |  |  |     |- java.util.TreeMap$Entry @ 0x77ec5fa48                   
                                                                   |           
40 |     5'960'072 |      0.31%
|  |  |  |  |  |     |  |- java.util.TreeMap$Entry @ 0x77ec5fa98                
                                                                   |           
40 |     5'958'072 |      0.31%
|  |  |  |  |  |     |  |  |- java.util.TreeMap$Entry @ 0x77fc09bf0             
                                                                   |           
40 |     5'949'864 |      0.30%
|  |  |  |  |  |     |  |  |- 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820e20         
                                     |           72 |         8'168 |      0.00%
|  |  |  |  |  |     |  |  '- Total: 2 entries                                  
                                                                   |            
  |               |           
|  |  |  |  |  |     |  |- java.util.TreeMap$Entry @ 0x77ec5fa70                
                                                                   |           
40 |         1'000 |      0.00%
|  |  |  |  |  |     |  |  '- 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x78347fbc0         
                                     |           72 |           960 |      0.00%
|  |  |  |  |  |     |  |     |- org.apache.lucene.util.fst.FST @ 0x788fe34c8   
                                                                   |          
104 |           840 |      0.00%
|  |  |  |  |  |     |  |     |  |- org.apache.lucene.util.fst.FST$Arc[128] @ 
0x7870932a0                                                          |          
528 |           528 |      0.00%
|  |  |  |  |  |     |  |     |  |- org.apache.lucene.util.fst.BytesStore @ 
0x77ec5fb60                                                            |        
   40 |           144 |      0.00%
|  |  |  |  |  |     |  |     |  |  '- java.util.ArrayList @ 0x780663b28        
                                                                   |           
24 |           104 |      0.00%
|  |  |  |  |  |     |  |     |  |- org.apache.lucene.util.BytesRef @ 
0x780663b10                                                                  |  
         24 |            48 |      0.00%
|  |  |  |  |  |     |  |     |  |  '- byte[5] @ 0x780663b58  .....             
                                                                   |           
24 |            24 |      0.00%
|  |  |  |  |  |     |  |     |  |- int[0] @ 0x780663af8                        
                                                                   |           
16 |            16 |      0.00%
|  |  |  |  |  |     |  |     |  '- Total: 4 entries                            
                                                                   |            
  |               |           
|  |  |  |  |  |     |  |     |- org.apache.lucene.util.BytesRef @ 0x780663ae0  
                                                                   |           
24 |            48 |      0.00%
|  |  |  |  |  |     |  |     '- Total: 2 entries                               
                                                                   |            
  |               |           
|  |  |  |  |  |     |  |- 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820dd8         
                                        |           72 |           960 |      
0.00%
|  |  |  |  |  |     |  '- Total: 3 entries                                     
                                                                   |            
  |               |           
|  |  |  |  |  |     |- 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820d90         
                                           |           72 |         2'656 |     
 0.00%
|  |  |  |  |  |     '- Total: 3 entries                                        
                                                                   |            
  |               |           
|  |  |  |  |  |- org.apache.lucene.codecs.lucene41.Lucene41PostingsReader @ 
0x78274ab88                                                           |         
  32 |         4'032 |      0.00%
|  |  |  |  |  |- org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput @ 
0x788820d48                                                             |       
    72 |         1'680 |      0.00%
|  |  |  |  |  '- Total: 3 entries                                              
                                                                   |            
  |               |           
|  |  |  |  |- java.util.TreeMap @ 0x783902798                                  
                                                                   |           
48 |           368 |      0.00%
|  |  |  |  |- java.util.HashMap @ 0x7839027c8                                  
                                                                   |           
48 |           232 |      0.00%
|  |  |  |  '- Total: 3 entries                                                 
                                                                   |            
  |               |           
|  |  |  |- org.apache.lucene.index.SegmentCoreReaders$1 @ 0x78274aaa8          
                                                                   |           
32 |        17'688 |      0.00%
|  |  |  |- org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer @ 
0x7822983c0                                                              |      
     48 |         6'504 |      0.00%
|  |  |  |- org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$3 @ 
0x7b1424f10                                                            |        
   24 |         3'456 |      0.00%
|  |  |  |- org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader 
@ 0x7910e98c8                                                       |           
56 |         1'240 |      0.00%
|  |  |  |- org.apache.lucene.index.SegmentCoreReaders$3 @ 0x78274aae8          
                                                                   |           
32 |           456 |      0.00%
|  |  |  |- 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader @ 
0x77fb743a0                                                  |           40 |   
        344 |      0.00%
|  |  |  |- java.lang.String @ 0x78292d4c8  
NIOFSIndexInput(path="/opt/webs/fust.ch/WEB-INF/indexes/1/fr_CH_1/fustusermanuals/full/__data/_n8.fdt")|
           32 |           256 |      0.00%
|  |  |  |- org.apache.lucene.index.SegmentCoreReaders$2 @ 0x78274aac8          
                                                                   |           
32 |           240 |      0.00%
|  |  |  |- java.util.Collections$SynchronizedSet @ 0x780661c68                 
                                                                   |           
24 |           216 |      0.00%
|  |  |  |- sun.nio.ch.FileChannelImpl @ 0x782298420                            
                                                                   |           
48 |           152 |      0.00%
|  |  |  |- java.io.RandomAccessFile @ 0x782933780                              
                                                                   |           
32 |            48 |      0.00%
|  |  |  |- java.io.FileDescriptor @ 0x780b56148                                
                                                                   |           
24 |            40 |      0.00%
|  |  |  |- java.util.concurrent.atomic.AtomicInteger @ 0x780661c38             
                                                                   |           
16 |            16 |      0.00%
|  |  |  '- Total: 14 entries                                                   
                                                                   |            
  |               |           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Does this help?

-----Ursprüngliche Nachricht-----
Von: Michael McCandless [mailto:luc...@mikemccandless.com] 
Gesendet: Freitag, 13. Juni 2014 13:15
An: Lucene Users
Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged

On Fri, Jun 13, 2014 at 3:02 AM, Clemens Wyss DEV <clemens...@mysign.ch> wrote:
>> limit how many fields have norms enabled
> We have one index for approx 7000 pdfs (24GB). Of course no content is STOREd 
> (but ANALYZEd). This very index occupies 4GB on disk and the corresponding 
> IndexReader is 60MB.
> Are norms per default enabled org.apache.lucene.document .TextField?

Yes.  Norms are a good idea for "large text fields", e.g. body text or a catch 
all field, but usually not a good idea for tiny fields (e.g.
title).

>> use disk-based doc values not field cache
> How is this done?

Add XXXDocValuesField instead of e.g. StringField.

>> etc.
> such as? ;)

Upgrade to the upcoming Lucene 4.9; there have been some improvements e.g. to 
norms compression.  You can tune your terms index settings, but terms index 
usually doesn't use much RAM.

You can fire up your up, get all searchers warmed, and take a heap dump and see 
what's using RAM.  We can iterate from there.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to