Hi Manuel,
htdig -i forces a 'from scratch' recrawl.
htdig be default does a traversal of the existing index and issues HEAD requests to see if a page has changed. Exactly what you described below...
Please make sure you have 'head_before_get' enabled.
What version are you using?
Thanks
On Thu, 4 Nov 2004, Manuel Lemos wrote:
Hello,
I tried the general list but it seems nobody could help. Lets see if anybody can help here:
I have been using htdig for years to crawl a site that now has over
10.000 pages. Since it may go through many changes in the pages I have been reindexing the whole site once on a daily basis.
However this lazy indexing approach is taking too much resources.
Therefore I am looking into a better approach of keeping a list of only
the pages that have changed and just reindex those pages in much shorter cycle than what I am doing.
My question is how can I reindex just a few pages at once and merge the crawled pages with a previously indexed site database? I mean, index only a few pages that I list and only follow links to site pages that were not yet indexed.
--
Neal Richter Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
