Caffeinate The World wrote:
> 
> it makes sense to me too. i was rather surprised that it started
> dragging this early at around 160,000 indexed URL's. i was predicting
> around 500,000 or so.
> 
> another thing i forgot to mention is that indexer also slowed down
> considerably. i used to be able to index and have the log file grow to
> about 30mb in a couple of hours, but now, in the same time i get about
> 1mb log files.


Probably adding index on next_index_time will help. But I'm not
sure. The way to check it is to add #define DEBUG_SQL in sql.c
and check that bottleneck is in the query

   SELECT ... FROM url WHERE next_index_time>=XXX ....

  where XXX is current unix timestamp. indexer retrieves  targets
to be indexed using this query. Being executed on huge table without
a key on next_index_time this query may take a while.


  Please give feedback!



> --- Zenon Panoussis <[EMAIL PROTECTED]> wrote:
> >
> >
> > Caffeinate The World skrev:
> > >
> > > it seems as though as the ./tree is build up, splitter seems to
> > take
> > > progressivelly longer and longer to do it's work. i've been
> > watching it
> > > for days now. before it could splitter a 31mb (thanks for that bug
> > fix)
> > > file in a matter of hours, now it takes days. even on smaller log
> > > files, it seems to take on a magnitude of 3 or 4 times longer.
> >
> > Isn't that logical? Reading and rewriting a small word file
> > takes shorter time than doing the same with a big one, and
> > when you have a million of them I guess the difference becomes
> > noticeable. I imagine that the way to solve this at the splitter
> > end would be to increase that 31 MB limit, so that more can get
> > done every time splitter is run. At the end of the user I guess
> > the solution is called "big fat RAID array" and has a salty
> > price tag ;)
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to