Lucene stores 1 byte (disk and RAM, when searching that field) per document for any field that has norms enabled, even for documents that do not contain that field.
In your case, that's ~20 MB per field (once optimize is done), times 559 fields = ~11TB of storage. You should index these fields with Field.Index.ANALYZED_NO_NORMS to turn off norms. But, this means field/doc boosting, and the normal length boosting Lucene normally does (shorter documents get a better score), will be silently disabled. Also: you must fully re-index from scratch, otherwise the norms will turn themselves back on when segments merge together. Mike On Fri, Jan 8, 2010 at 7:55 AM, Yuliya Palchaninava <y...@solute.de> wrote: > Thanks Michael. > > You are probably wright. > > Not optimized size is 4.1G, optimized index is about 15G. > > Yes, our documents do have many different indexed fields and norms are > enabled. > Nr of fields: 559 > Nr of documents: 20845906 > Nr of terms: 25615389 > > Could you please give me a more detailled explanation, how the storage of > norms effects the size of an index. > What do you mean exactly with "norms are not stored sparsely"? > > Thanks, > Yuliya > >> -----Ursprüngliche Nachricht----- >> Von: Michael McCandless [mailto:luc...@mikemccandless.com] >> Gesendet: Donnerstag, 7. Januar 2010 18:00 >> An: java-user@lucene.apache.org >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice as >> large as the not optimized index >> >> Do your documents have many different indexed fields? If you >> do, and norms are enabled, that could be the cause (norms are >> not stored sparsely). >> >> But: what actual sizes are we talking about? >> >> Mike >> >> On Thu, Jan 7, 2010 at 11:50 AM, Yuliya Palchaninava >> <y...@solute.de> wrote: >> > Otis, >> > >> > thanks for the answer. >> > >> > Unfortunatelly the index *directory* remains larger *after" >> the optimization. >> > In our case the otimization was/is completed successfully >> and, as you >> > say, there is only one segment in the directory. >> > >> > Some other ideas? >> > >> > Thanks, >> > Yuliya >> > >> >> -----Ursprüngliche Nachricht----- >> >> Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> >> Gesendet: Donnerstag, 7. Januar 2010 17:35 >> >> An: java-user@lucene.apache.org >> >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice >> as large >> >> as the not optimized index >> >> >> >> Yuliya, >> >> >> >> The index *directory* will be larger *while* you are optimizing. >> >> After the optimization is completed successfully, the >> index directory >> >> will be smaller. It is possible that your index directory is >> >> large(r) because you have some left-over segments (e.g. from some >> >> earlier failed/interrupted optimizations) that are not >> really a part >> >> of the index. After optimizing, you should have only 1 >> segment, so >> >> if you see more than 1 segment, look at the ones with older >> >> timestamps. Those can be (re)moved. >> >> >> >> Otis >> >> -- >> >> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch >> >> >> >> >> >> >> >> ----- Original Message ---- >> >> > From: Yuliya Palchaninava <y...@solute.de> >> >> > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> >> >> > Sent: Thu, January 7, 2010 11:23:08 AM >> >> > Subject: Lucene 2.9 and 3.0: Optimized index is thrice as >> >> large as the >> >> > not optimized index >> >> > >> >> > Hi, >> >> > >> >> > According to the api documentation: "In general, once >> the optimize >> >> > completes, the total size of the index will be less than >> >> the size of >> >> > the starting index. It could be quite a bit smaller (if >> there were >> >> > many pending deletes) or just slightly smaller". In our >> >> case the index >> >> > becomes not smaller but larger, namely thrice as large. >> >> > >> >> > The not optimized index doesn't contain compressed fields, >> >> what could >> >> > have caused the growth of the index due to the >> otimization. So we >> >> > cannot explain what happens. >> >> > >> >> > Does someone have an explanation for the index growth due >> >> to the optimization? >> >> > >> >> > Thanks, >> >> > Yuliya >> >> > >> >> > >> >> > >> >> >> --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> > >> --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org