Hi, The reason for this is multithreaded merging. While indexing, Lucene merges segments in a separate threads. As this runs multithreaded, there is no strict "order of things". Depending on how fast the disk is or what other processes are running in parallel, the merging may proceed fast or slower so creating another "index structure", where different segments are merged in other combinations, leading to different term dictionary or posting list sizes.
If you do a forceMerge(1) at the end (can take very long time), the whole index is merged into one segment, which should have the same size for the same dataset. Please don't compare file MD5/SHA1, the files will *not* be identical, because order of documents may still vary. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Jose Carlos Canova [mailto:jose.carlos.can...@gmail.com] > Sent: Tuesday, March 25, 2014 6:36 AM > To: java-user@lucene.apache.org > Subject: Index size for Same DataSet. > > Hello, > > I have a doubt about index size, > I am testing a program using Lucene to index some dataset. > > At the final the result of index size is varying a little, since i haven't > finished > the tests at all, i'm doubt if it is normal the index size vary on size among > different tests. > > att. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org