The problem is solved, it was my mistake, by accident i have stored
the file text without tokenization in the categorie field!
Thanks for your help.

Ahmed

2011/2/3, Ben van Klinken <bvanklin...@gmail.com>:
> Stored fields are kept as plain text. It is possible to compress the
> fields if it is a lot of data, but you could look into not storing
> certain fields (but of course you won't be able to retrieve the data
> out of the document after a search). depending on your requirements
> this may be interesting.
>
> another thing i suggest is looking at the index using a tool called
> 'luke'  (http://www.getopt.org/luke/). You can analyse what's going
> on, see how much data there is, perhaps run the check index tool,
> check to see if there are any extra segments that aren't used, etc.
>
> hope that helps
> ben
>
> On Fri, Feb 4, 2011 at 7:00 AM, Ahmed Saidi <ci7nu...@gmail.com> wrote:
>> i'm using an arabic analyzer, it analyze only arabic characters, please
>> see
>> the attached file.
>> there is no duplicate document, and no IndexReader is open.
>>
>> Ahmed
>>
>> 2011/2/3 Ahmed Saidi <ci7nu...@gmail.com>
>>>
>>> i'm using an arabic analyzer, it analyze only arabic characters, please
>>> see the attached file.
>>> there is no duplicate document, and no IndexReader is open.
>>>
>>> Ahmed
>>> 2011/2/3 Veit Jahns <nuncupa...@googlemail.com>
>>>>
>>>> 2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>:
>>>> > Even after optimizing the index, the size is 20 gb. The size of the
>>>> > data which i want to index is about 8 GB.
>>>>
>>>> Strange indeed. Just some further questions which came into my mind:
>>>>
>>>> - What kind of analyzer do you use for tokenizing?
>>>> - Is the correct number of documents in the indexed and no document
>>>> indexed twice?
>>>>
>>>> And this disuccussion [1] may be useful to you.
>>>>
>>>> > if i add a set of fields that have the same values to the index, will
>>>> > clucene do any kind of compression?
>>>>
>>>> Not directly. But as far as I understand the index format [2] the
>>>> terms are only stored in the term dictionary and which are references
>>>> in an implicit manner in the frequency files.
>>>>
>>>> Veit
>>>>
>>>> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622
>>>> [2] http://lucene.apache.org/java/2_3_2/fileformats.html
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
>>>> Finally, a world-class log management solution at an even better
>>>> price-free!
>>>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
>>>> February 28th, so secure your free ArcSight Logger TODAY!
>>>> http://p.sf.net/sfu/arcsight-sfd2d
>>>> _______________________________________________
>>>> CLucene-developers mailing list
>>>> CLucene-developers@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> The modern datacenter depends on network connectivity to access resources
>> and provide services. The best practices for maximizing a physical
>> server's
>> connectivity to a physical network are well understood - see how these
>> rules translate into the virtual world?
>> http://p.sf.net/sfu/oracle-sfdevnlfb
>> _______________________________________________
>> CLucene-developers mailing list
>> CLucene-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>>
>>
>
>
>
> --
> -------------------------------------
> Ben van Klinken
>
> Mob: 0401 921847
> Em: b...@villagechief.com
>
> ------------------------------------------------------------------------------
> The modern datacenter depends on network connectivity to access resources
> and provide services. The best practices for maximizing a physical server's
> connectivity to a physical network are well understood - see how these
> rules translate into the virtual world?
> http://p.sf.net/sfu/oracle-sfdevnlfb
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>

-- 
Envoyé avec mon mobile

------------------------------------------------------------------------------
The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world? 
http://p.sf.net/sfu/oracle-sfdevnlfb
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to