i'm using an arabic analyzer, it analyze only arabic characters, please see
the attached file.
there is no duplicate document, and no IndexReader is open.

Ahmed
2011/2/3 Veit Jahns <nuncupa...@googlemail.com>

> 2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>:
> > Even after optimizing the index, the size is 20 gb. The size of the
> > data which i want to index is about 8 GB.
>
> Strange indeed. Just some further questions which came into my mind:
>
> - What kind of analyzer do you use for tokenizing?
> - Is the correct number of documents in the indexed and no document
> indexed twice?
>
> And this disuccussion [1] may be useful to you.
>
> > if i add a set of fields that have the same values to the index, will
> > clucene do any kind of compression?
>
> Not directly. But as far as I understand the index format [2] the
> terms are only stored in the term dictionary and which are references
> in an implicit manner in the frequency files.
>
> Veit
>
> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622
> [2] http://lucene.apache.org/java/2_3_2/fileformats.html
>
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better
> price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>

Attachment: arabic-analyzer.tar.gz
Description: GNU Zip compressed data

------------------------------------------------------------------------------
The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world? 
http://p.sf.net/sfu/oracle-sfdevnlfb
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to