2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>: > Even after optimizing the index, the size is 20 gb. The size of the > data which i want to index is about 8 GB.
Strange indeed. Just some further questions which came into my mind: - What kind of analyzer do you use for tokenizing? - Is the correct number of documents in the indexed and no document indexed twice? And this disuccussion [1] may be useful to you. > if i add a set of fields that have the same values to the index, will > clucene do any kind of compression? Not directly. But as far as I understand the index format [2] the terms are only stored in the term dictionary and which are references in an implicit manner in the frequency files. Veit [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622 [2] http://lucene.apache.org/java/2_3_2/fileformats.html ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers