The problem is solved, it was my mistake, by accident i have stored the file text without tokenization in the categorie field! Thanks for your help.
Ahmed 2011/2/3, Ben van Klinken <bvanklin...@gmail.com>: > Stored fields are kept as plain text. It is possible to compress the > fields if it is a lot of data, but you could look into not storing > certain fields (but of course you won't be able to retrieve the data > out of the document after a search). depending on your requirements > this may be interesting. > > another thing i suggest is looking at the index using a tool called > 'luke' (http://www.getopt.org/luke/). You can analyse what's going > on, see how much data there is, perhaps run the check index tool, > check to see if there are any extra segments that aren't used, etc. > > hope that helps > ben > > On Fri, Feb 4, 2011 at 7:00 AM, Ahmed Saidi <ci7nu...@gmail.com> wrote: >> i'm using an arabic analyzer, it analyze only arabic characters, please >> see >> the attached file. >> there is no duplicate document, and no IndexReader is open. >> >> Ahmed >> >> 2011/2/3 Ahmed Saidi <ci7nu...@gmail.com> >>> >>> i'm using an arabic analyzer, it analyze only arabic characters, please >>> see the attached file. >>> there is no duplicate document, and no IndexReader is open. >>> >>> Ahmed >>> 2011/2/3 Veit Jahns <nuncupa...@googlemail.com> >>>> >>>> 2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>: >>>> > Even after optimizing the index, the size is 20 gb. The size of the >>>> > data which i want to index is about 8 GB. >>>> >>>> Strange indeed. Just some further questions which came into my mind: >>>> >>>> - What kind of analyzer do you use for tokenizing? >>>> - Is the correct number of documents in the indexed and no document >>>> indexed twice? >>>> >>>> And this disuccussion [1] may be useful to you. >>>> >>>> > if i add a set of fields that have the same values to the index, will >>>> > clucene do any kind of compression? >>>> >>>> Not directly. But as far as I understand the index format [2] the >>>> terms are only stored in the term dictionary and which are references >>>> in an implicit manner in the frequency files. >>>> >>>> Veit >>>> >>>> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622 >>>> [2] http://lucene.apache.org/java/2_3_2/fileformats.html >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >>>> Finally, a world-class log management solution at an even better >>>> price-free! >>>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >>>> February 28th, so secure your free ArcSight Logger TODAY! >>>> http://p.sf.net/sfu/arcsight-sfd2d >>>> _______________________________________________ >>>> CLucene-developers mailing list >>>> CLucene-developers@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/clucene-developers >>> >> >> >> ------------------------------------------------------------------------------ >> The modern datacenter depends on network connectivity to access resources >> and provide services. The best practices for maximizing a physical >> server's >> connectivity to a physical network are well understood - see how these >> rules translate into the virtual world? >> http://p.sf.net/sfu/oracle-sfdevnlfb >> _______________________________________________ >> CLucene-developers mailing list >> CLucene-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> > > > > -- > ------------------------------------- > Ben van Klinken > > Mob: 0401 921847 > Em: b...@villagechief.com > > ------------------------------------------------------------------------------ > The modern datacenter depends on network connectivity to access resources > and provide services. The best practices for maximizing a physical server's > connectivity to a physical network are well understood - see how these > rules translate into the virtual world? > http://p.sf.net/sfu/oracle-sfdevnlfb > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers > -- Envoyé avec mon mobile ------------------------------------------------------------------------------ The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers