i'm using an arabic analyzer, it analyze only arabic characters, please see
the attached file.
there is no duplicate document, and no IndexReader is open.

Ahmed

2011/2/3 Ahmed Saidi <ci7nu...@gmail.com>

> i'm using an arabic analyzer, it analyze only arabic characters, please see
> the attached file.
> there is no duplicate document, and no IndexReader is open.
>
> Ahmed
> 2011/2/3 Veit Jahns <nuncupa...@googlemail.com>
>
> 2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>:
>> > Even after optimizing the index, the size is 20 gb. The size of the
>> > data which i want to index is about 8 GB.
>>
>> Strange indeed. Just some further questions which came into my mind:
>>
>> - What kind of analyzer do you use for tokenizing?
>> - Is the correct number of documents in the indexed and no document
>> indexed twice?
>>
>> And this disuccussion [1] may be useful to you.
>>
>> > if i add a set of fields that have the same values to the index, will
>> > clucene do any kind of compression?
>>
>> Not directly. But as far as I understand the index format [2] the
>> terms are only stored in the term dictionary and which are references
>> in an implicit manner in the frequency files.
>>
>> Veit
>>
>> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622
>> [2] http://lucene.apache.org/java/2_3_2/fileformats.html
>>
>>
>> ------------------------------------------------------------------------------
>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
>> Finally, a world-class log management solution at an even better
>> price-free!
>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
>> February 28th, so secure your free ArcSight Logger TODAY!
>> http://p.sf.net/sfu/arcsight-sfd2d
>> _______________________________________________
>> CLucene-developers mailing list
>> CLucene-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>>
>
>

Attachment: arabic-analyzer.tar.gz
Description: GNU Zip compressed data

------------------------------------------------------------------------------
The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world? 
http://p.sf.net/sfu/oracle-sfdevnlfb
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to