i'm using an arabic analyzer, it analyze only arabic characters, please see the attached file. there is no duplicate document, and no IndexReader is open.
Ahmed 2011/2/3 Veit Jahns <nuncupa...@googlemail.com> > 2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>: > > Even after optimizing the index, the size is 20 gb. The size of the > > data which i want to index is about 8 GB. > > Strange indeed. Just some further questions which came into my mind: > > - What kind of analyzer do you use for tokenizing? > - Is the correct number of documents in the indexed and no document > indexed twice? > > And this disuccussion [1] may be useful to you. > > > if i add a set of fields that have the same values to the index, will > > clucene do any kind of compression? > > Not directly. But as far as I understand the index format [2] the > terms are only stored in the term dictionary and which are references > in an implicit manner in the frequency files. > > Veit > > [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622 > [2] http://lucene.apache.org/java/2_3_2/fileformats.html > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better > price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers >
arabic-analyzer.tar.gz
Description: GNU Zip compressed data
------------------------------------------------------------------------------ The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb
_______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers