The bootstrap indexing actually ended up taking twice the amount of
time listed below.  When there was no index directory and I made the
call to rebuild_index, the ferret_index.log file had these lines in
it:
# Logfile created on Thu Jun 07 08:46:34 -0400 2007 by logger.rb/1.5.2.9
rebuild index: []
reindexing model CurrentProgram
reindex model CurrentProgram : 0.00% complete : 25658.57 secs to finish
...

when it hit 100%, the following lines appeared:
reindex model CurrentProgram : 99.56% complete : 219.29 secs to finish
Created Ferret index in:
./script/../config/../config/../index/production/current_program
rebuild index: [["CurrentProgram"]]
reindexing model CurrentProgram
reindex model CurrentProgram : 0.00% complete : 25740.65 secs to finish
reindex model CurrentProgram : 0.95% complete : 26065.95 secs to finish


So it looks like for some reason, it performed the rebuild twice. :(
When I looked at it this morning, it had over 116k files in the
current_program directory. Not the most healthy thing.  I ran
CurrentProgram.aaf_index.ferret_index.optimize and it took a few
minutes and fully optimized down to three files.


I made the testing patch suggested and am running now.  I did not
delete the index directory.  The ferret_index.log started out with
these lines:
rebuild index: [["CurrentProgram"]]
reindexing model CurrentProgram
reindex model CurrentProgram : 0.00% complete : 3540.78 secs to finish
reindex model CurrentProgram : 0.95% complete : 3510.69 secs to finish

So it is a significantly shorter time when it isn't actually adding
the doc to the index.

If you have any further ideas on things to try or any other
information you'd like to collect, please let me know.  In the
meantime, I'm going to try out the acts_as_solr plugin since I've had
a bit more experience with tuning solr and see what the indexing
performance on that looks like.

Daniel


On 6/8/07, Jens Kraemer <[EMAIL PROTECTED]> wrote:
> On Thu, Jun 07, 2007 at 05:19:26PM +0000, Daniel Einspanjer wrote:
> > I am looking at trying to use ferret/aaf to supplement my querying against a
> > medium and large table with lots of columns.  Some facts first:
> >
> > Ferret 0.11.4
> > AAF 0.4.0
> > Ruby 1.8.6
> > Rails 1.2.3
> >
> > Medium table:
> > 105,464 rows
> > 168 columns (mostly varchar(20))
> > 11 actual columns indexed in aaf plus
> > 40 virtual columns indexed in aaf (virtual is concat of two physical 
> > columns.
> > e.g. cast_first_name_1 + cast_last_name_1 through cast_first_name_20 +
> > cast_last_name_20)
> >
> > Large table:
> > 1,244,716 rows
> > same column/index structure
> >
> > These tables are not updated via Ruby, only read.  I am trying to use
> > rebuild_index to bootstrap the medium sized table and it is taking a very 
> > long
> > time (running for about 4 hours, indicates 50% complete with 4 hours 
> > remaining)
> > and creating a massive number of files in the index directory (currently 
> > about
> > 65k, was 90k earlier)
>
> strange. Ferret is faster than that - I have a test script that builds
> an index of 100000 documents with 50 fields each containing a single random
> word in under 10 Minutes here on standard hardware.
>
> Maybe the problem is something else? For starters, change line 220
> of local_index.rb from
> index << rec.to_doc if rec.ferret_enabled?(true)
> to
> doc = rec.to_doc if rec.ferret_enabled?(true)
>
> so nothing is added to the index. How long does that take?
>
> Jens
>
> --
> Jens Krämer
> webit! Gesellschaft für neue Medien mbH
> Schnorrstraße 76 | 01069 Dresden
> Telefon +49 351 46766-0 | Telefax +49 351 46766-66
> [EMAIL PROTECTED] | www.webit.de
>
> Amtsgericht Dresden | HRB 15422
> GF Sven Haubold, Hagen Malessa
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk
>
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to