Thanks a lot! >"large text fields" What is a good limit (in characters) to switch from StringField to TextField? Do <Langugae>Analyzers (e.g. GermanAnalyzer) help a lot in reducing the size of an Index?
> Add XXXDocValuesField instead of e.g. StringField. Does this apply only for StringFields? Or for TextFields too? > Upgrade to the upcoming Lucene 4.9 we have not yet transitionen to Java 7/8 ... hopefully soon ;) > and take a heap dump and see what's using RAM Find attached a snippet from MemoryAnalyzer Class Name | Shallow Heap | Retained Heap | Percentage ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- org.apache.lucene.index.StandardDirectoryReader @ 0x783932460 | 72 | 59'255'872 | 3.04% |- org.apache.lucene.index.SegmentReader[24] @ 0x794089ee0 | 112 | 59'190'960 | 3.03% | |- org.apache.lucene.index.SegmentReader @ 0x788820f40 | 72 | 16'905'072 | 0.87% | | |- org.apache.lucene.index.SegmentCoreReaders @ 0x7910cacc8 | 56 | 16'895'576 | 0.87% | | | |- org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader @ 0x780661c50 | 24 | 16'864'864 | 0.86% | | | | |- org.apache.lucene.codecs.BlockTreeTermsReader @ 0x7910cae50 | 56 | 16'864'240 | 0.86% | | | | | |- java.util.TreeMap @ 0x783902738 | 48 | 16'858'472 | 0.86% | | | | | | '- java.util.TreeMap$Entry @ 0x77ec5f9f8 | 40 | 16'858'424 | 0.86% | | | | | | |- java.util.TreeMap$Entry @ 0x77ec5fa20 | 40 | 10'895'656 | 0.56% | | | | | | |- java.util.TreeMap$Entry @ 0x77ec5fa48 | 40 | 5'960'072 | 0.31% | | | | | | | |- java.util.TreeMap$Entry @ 0x77ec5fa98 | 40 | 5'958'072 | 0.31% | | | | | | | | |- java.util.TreeMap$Entry @ 0x77fc09bf0 | 40 | 5'949'864 | 0.30% | | | | | | | | |- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820e20 | 72 | 8'168 | 0.00% | | | | | | | | '- Total: 2 entries | | | | | | | | | | |- java.util.TreeMap$Entry @ 0x77ec5fa70 | 40 | 1'000 | 0.00% | | | | | | | | '- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x78347fbc0 | 72 | 960 | 0.00% | | | | | | | | |- org.apache.lucene.util.fst.FST @ 0x788fe34c8 | 104 | 840 | 0.00% | | | | | | | | | |- org.apache.lucene.util.fst.FST$Arc[128] @ 0x7870932a0 | 528 | 528 | 0.00% | | | | | | | | | |- org.apache.lucene.util.fst.BytesStore @ 0x77ec5fb60 | 40 | 144 | 0.00% | | | | | | | | | | '- java.util.ArrayList @ 0x780663b28 | 24 | 104 | 0.00% | | | | | | | | | |- org.apache.lucene.util.BytesRef @ 0x780663b10 | 24 | 48 | 0.00% | | | | | | | | | | '- byte[5] @ 0x780663b58 ..... | 24 | 24 | 0.00% | | | | | | | | | |- int[0] @ 0x780663af8 | 16 | 16 | 0.00% | | | | | | | | | '- Total: 4 entries | | | | | | | | | | | |- org.apache.lucene.util.BytesRef @ 0x780663ae0 | 24 | 48 | 0.00% | | | | | | | | '- Total: 2 entries | | | | | | | | | | |- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820dd8 | 72 | 960 | 0.00% | | | | | | | '- Total: 3 entries | | | | | | | | | |- org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader @ 0x788820d90 | 72 | 2'656 | 0.00% | | | | | | '- Total: 3 entries | | | | | | | | |- org.apache.lucene.codecs.lucene41.Lucene41PostingsReader @ 0x78274ab88 | 32 | 4'032 | 0.00% | | | | | |- org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput @ 0x788820d48 | 72 | 1'680 | 0.00% | | | | | '- Total: 3 entries | | | | | | | |- java.util.TreeMap @ 0x783902798 | 48 | 368 | 0.00% | | | | |- java.util.HashMap @ 0x7839027c8 | 48 | 232 | 0.00% | | | | '- Total: 3 entries | | | | | | |- org.apache.lucene.index.SegmentCoreReaders$1 @ 0x78274aaa8 | 32 | 17'688 | 0.00% | | | |- org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer @ 0x7822983c0 | 48 | 6'504 | 0.00% | | | |- org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$3 @ 0x7b1424f10 | 24 | 3'456 | 0.00% | | | |- org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader @ 0x7910e98c8 | 56 | 1'240 | 0.00% | | | |- org.apache.lucene.index.SegmentCoreReaders$3 @ 0x78274aae8 | 32 | 456 | 0.00% | | | |- org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader @ 0x77fb743a0 | 40 | 344 | 0.00% | | | |- java.lang.String @ 0x78292d4c8 NIOFSIndexInput(path="/opt/webs/fust.ch/WEB-INF/indexes/1/fr_CH_1/fustusermanuals/full/__data/_n8.fdt")| 32 | 256 | 0.00% | | | |- org.apache.lucene.index.SegmentCoreReaders$2 @ 0x78274aac8 | 32 | 240 | 0.00% | | | |- java.util.Collections$SynchronizedSet @ 0x780661c68 | 24 | 216 | 0.00% | | | |- sun.nio.ch.FileChannelImpl @ 0x782298420 | 48 | 152 | 0.00% | | | |- java.io.RandomAccessFile @ 0x782933780 | 32 | 48 | 0.00% | | | |- java.io.FileDescriptor @ 0x780b56148 | 24 | 40 | 0.00% | | | |- java.util.concurrent.atomic.AtomicInteger @ 0x780661c38 | 16 | 16 | 0.00% | | | '- Total: 14 entries | | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Does this help? -----Ursprüngliche Nachricht----- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Freitag, 13. Juni 2014 13:15 An: Lucene Users Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged On Fri, Jun 13, 2014 at 3:02 AM, Clemens Wyss DEV <clemens...@mysign.ch> wrote: >> limit how many fields have norms enabled > We have one index for approx 7000 pdfs (24GB). Of course no content is STOREd > (but ANALYZEd). This very index occupies 4GB on disk and the corresponding > IndexReader is 60MB. > Are norms per default enabled org.apache.lucene.document .TextField? Yes. Norms are a good idea for "large text fields", e.g. body text or a catch all field, but usually not a good idea for tiny fields (e.g. title). >> use disk-based doc values not field cache > How is this done? Add XXXDocValuesField instead of e.g. StringField. >> etc. > such as? ;) Upgrade to the upcoming Lucene 4.9; there have been some improvements e.g. to norms compression. You can tune your terms index settings, but terms index usually doesn't use much RAM. You can fire up your up, get all searchers warmed, and take a heap dump and see what's using RAM. We can iterate from there. Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org