Hello,

I tried the general list but it seems nobody could help. Lets see if anybody can help here:

I have been using htdig for years to crawl a site that now has over
10.000 pages. Since it may go through many changes in the pages I have been reindexing the whole site once on a daily basis.


However this lazy indexing approach is taking too much resources.
Therefore I am looking into a better approach of keeping a list of only
the pages that have changed and just reindex those pages in much shorter cycle than what I am doing.


My question is how can I reindex just a few pages at once and merge the
crawled pages with a previously indexed site database? I mean, index
only a few pages that I list and only follow links to site pages that
were not yet indexed.

--

Regards,
Manuel Lemos

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/

Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html



-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to