Hello, Just saw this thread and am considering using incremental updates since our site is so large (20,000+ URLs) - I have a question though - will an incremental update remove any URLs that no longer appear on the site? In other words if we remove a document from the website will running htdig without the -i option update the database to exclude that URL? Or do we need to do an 'initial' build in order to have that URL removed?
And one further question - if no .work files exist will htdig create them when run without the -i option? Or do you need to have an initial set in place for htdig to append to for incremental updates? Thanks in advance for any advice! Cheers, Jonathan Schlackl > I'm not exactly sure what you are looking for, but we had a similar > problem. We used to do a full index of our entire site every night but > as you say, that took up too many resources. We now do a full index > once a week and an incremental index every night. In order to do a full > index, run htdig with the -i option. In order to run an incremental > index, run htdig without the -i option. > > > Malki Cymbalista > Webmaster, Weizmann Institute of Science > Rehovot, Israel 76100 > Internet: [EMAIL PROTECTED] > 08-934-3036 > >>>> Manuel Lemos <[EMAIL PROTECTED]> 01/11/2004 19:29:26 >>> > Hello, > > > I have been using htdig for years to crawl a site that now has over > 30.000. Since it may have many changes in the pages I have been > reindexing the whole site on a daily basis. > > However this lazy indexing approach is taking too much resources. > Therefore I am looking into a better approach of keeping a list of only > > the pages that have changed and just reindex those pages in much > shorter > cycle than what I am doing. > > My question is how can I reindex just a few pages at once and merge the > > crawled pages with a previously indexed site database? I mean, index > only a few pages that I list and only follow links to site pages that > were not yet indexed. > > -- > > Regards, > Manuel Lemos > > PHP Classes - Free ready to use OOP components written in PHP > http://www.phpclasses.org/ > > PHP Reviews - Reviews of PHP books and other products > http://www.phpclasses.org/reviews/ > > Metastorage - Data object relational mapping layer generator > http://www.meta-language.net/metastorage.html > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > ------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general