Re: [Ferret-talk] Hitting Files per Directory Limits with Ferret?

Ewout Fri, 05 Jan 2007 10:06:26 -0800

If ferret would implement automatic optimization, it should indeed be
optional and parameterizable.


For example: suppose you are indexing 500 013 documents. After indexing,
you would naturally call index.optimize. But suppose ferret
automatically optimizes every 1000 insertions. Obviously, there's lots
of overhead in here (optimize 501 times instead of just once).

The ideal solution would be parellel: 
- index optimization happens in a separate process
- while optimizing, the old index is still available

Is this possible now? Is Ferret safe enough to allow one process to
optimize the index while another is using it?

Also, anyone has data about the duration of an optimization process? I
don't think it takes too long, but haven't got any concrete data on that
(yet).

Ewout


>Ferret itself does not automatically optimize itself after so many  
>document insertions?
>
>Lucene does, but maybe Ferret does not?   It certainly causes  
>indexing hiccups when it hits that optimization with Lucene, so care  
>has to be taken to be sure you account for that possible optimization  
>delay or to tune the parameters so you know when to expect it.
>
>       Erik
>
>
>On Jan 5, 2007, at 10:02 AM, Ewout wrote:
>
>> I created a patch for acts_as_ferret that will optimize the index  
>> every
>> 100 insertions (experience will have to show weither this constant is
>> adequate).
>>
>> The only prerequisite is that your model has an id attribute that
>> increases 1 by 1, automatically, since the id is used to determine  
>> when
>> to optimize.
>>
>> Just apply this patch to instance_methods.rb of acts_as_ferret to  
>> try it.
>>
>> Hope this will be of use.
>>
>>> Actually, it does not. The only call to index.optimize is in the
>>> rebuild_index method. A possible extension for aaf is that
>>> index.optimize is called automatically each C insertions, where C is
>>> some constant (1000 seems reasonable).
>>>
>>> I can only agree with Jan on scalability, at the moment I'm  
>>> keeping an
>>> index of over 700.000 bibliographic records. Searches are instant.
>>>
>>> Regards,
>>> Ewout
>>>
>>>> Hey Fez,
>>>>
>>>> the limit of indexed items of ferret (and lucene) shouldn't be in  
>>>> the
>>>> thousands but in the millions. I've indexed hundreds of thousands of
>>>> documents myself with ferret as well as with lucene and 20.000 is  
>>>> not even
>>>> near the limit. Regarding the file-count in the index directory:  
>>>> It seems as
>>>> if the index was never optimized. This defragments the chunks  
>>>> into one big
>>>> index file. You should investigate why this didn't happen. I did  
>>>> not look
>>>> into the aaf code for some time but I think that it should do index
>>>> optimization from time to time.
>>>>
>>>> Cheers,
>>>> Jan
>>>>
>>>>
>>>> --
>>>> ------------------------------
>>>> http://www.inviado.de - Internetseiten für RAe
>>>> http://www.xing.com/profile/Jan_Prill
>>>> _______________________________________________
>>>> Ferret-talk mailing list
>>>> [email protected]
>>>> http://rubyforge.org/mailman/listinfo/ferret-talk
>>>
>>> _______________________________________________
>>> Ferret-talk mailing list
>>> [email protected]
>>> http://rubyforge.org/mailman/listinfo/ferret-talk
>>> <acts_as_ferret_optimize.patch>
>> _______________________________________________
>> Ferret-talk mailing list
>> [email protected]
>> http://rubyforge.org/mailman/listinfo/ferret-talk
>
>_______________________________________________
>Ferret-talk mailing list
>[email protected]
>http://rubyforge.org/mailman/listinfo/ferret-talk


_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Hitting Files per Directory Limits with Ferret?

Reply via email to