2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>:
> Even after optimizing the index, the size is 20 gb. The size of the
> data which i want to index is about 8 GB.

Strange indeed. Just some further questions which came into my mind:

- What kind of analyzer do you use for tokenizing?
- Is the correct number of documents in the indexed and no document
indexed twice?

And this disuccussion [1] may be useful to you.

> if i add a set of fields that have the same values to the index, will
> clucene do any kind of compression?

Not directly. But as far as I understand the index format [2] the
terms are only stored in the term dictionary and which are references
in an implicit manner in the frequency files.

Veit

[1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622
[2] http://lucene.apache.org/java/2_3_2/fileformats.html

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to