i'm using an arabic analyzer, it analyze only arabic characters, please see the attached file. there is no duplicate document, and no IndexReader is open.
Ahmed 2011/2/3 Ahmed Saidi <ci7nu...@gmail.com> > i'm using an arabic analyzer, it analyze only arabic characters, please see > the attached file. > there is no duplicate document, and no IndexReader is open. > > Ahmed > 2011/2/3 Veit Jahns <nuncupa...@googlemail.com> > > 2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>: >> > Even after optimizing the index, the size is 20 gb. The size of the >> > data which i want to index is about 8 GB. >> >> Strange indeed. Just some further questions which came into my mind: >> >> - What kind of analyzer do you use for tokenizing? >> - Is the correct number of documents in the indexed and no document >> indexed twice? >> >> And this disuccussion [1] may be useful to you. >> >> > if i add a set of fields that have the same values to the index, will >> > clucene do any kind of compression? >> >> Not directly. But as far as I understand the index format [2] the >> terms are only stored in the term dictionary and which are references >> in an implicit manner in the frequency files. >> >> Veit >> >> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622 >> [2] http://lucene.apache.org/java/2_3_2/fileformats.html >> >> >> ------------------------------------------------------------------------------ >> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >> Finally, a world-class log management solution at an even better >> price-free! >> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >> February 28th, so secure your free ArcSight Logger TODAY! >> http://p.sf.net/sfu/arcsight-sfd2d >> _______________________________________________ >> CLucene-developers mailing list >> CLucene-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > >
arabic-analyzer.tar.gz
Description: GNU Zip compressed data
------------------------------------------------------------------------------ The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb
_______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers