I have written a set of scripts for htdig to read through all of the posts in my Vbulletin forum.  It allows htdig to view every post on its own page.  I then rewrite the urls so that the user sees them in the “pretty” form when they do a search.

 

My problem is that my forum has almost 1M posts.  Which means that that is 1M pages that htdig has to index.

 

I let it run for about 8hrs and it only dug about 20% of them.  I need to find a way to make the indexing more palatable to the server and was hoping someone can help me here.

 

Options I have considered.

 

1) Run a big dig (all 1M posts) then, run nightly digs of the posts in the last 24-36 hours, then merge the dbs.

 

2) break the posts up into ~50-100k page block and index them all separately, then merge the dbs.

 

 

How do you guys update your dbs?  Do I need to reindex them all every time?

 

Please help.

 

Also how can I search multiple dbs at once in 3.2?  Are there any docs for 3.2?

 

Thanks

 

-Rylan

 

 

 

Reply via email to