Hello,

  Just saw this thread and am considering using incremental updates since
our site is so large (20,000+ URLs) - I have a question though - will an
incremental update remove any URLs that no longer appear on the site? In
other words if we remove a document from the website will running htdig
without the -i option update the database to exclude that URL? Or do we
need to do an 'initial' build in order to have that URL removed?

  And one further question - if no .work files exist will htdig create
them when run without the -i option? Or do you need to have an initial
set in place for htdig to append to for incremental updates?

  Thanks in advance for any advice!

Cheers,
Jonathan Schlackl


> I'm not exactly sure what you are looking for, but we had a similar
> problem.  We used to do a full index of our entire site every night but
> as you say, that took up too many resources.  We now do a full index
> once a week and an incremental index every night.  In order to do a full
> index, run htdig with the -i option.  In order to run an incremental
> index, run htdig without the -i option.
>
>
> Malki Cymbalista
> Webmaster, Weizmann Institute of Science
> Rehovot, Israel 76100
> Internet: [EMAIL PROTECTED]
> 08-934-3036
>
>>>> Manuel Lemos <[EMAIL PROTECTED]> 01/11/2004 19:29:26 >>>
> Hello,
>
>
> I have been using htdig for years to crawl a site that now has over
> 30.000. Since it may have many changes in the pages I have been
> reindexing the whole site on a daily basis.
>
> However this lazy indexing approach is taking too much resources.
> Therefore I am looking into a better approach of keeping a list of only
>
> the pages that have changed and just reindex those pages in much
> shorter
>   cycle than what I am doing.
>
> My question is how can I reindex just a few pages at once and merge the
>
> crawled pages with a previously indexed site database? I mean, index
> only a few pages that I list and only follow links to site pages that
> were not yet indexed.
>
> --
>
> Regards,
> Manuel Lemos
>
> PHP Classes - Free ready to use OOP components written in PHP
> http://www.phpclasses.org/
>
> PHP Reviews - Reviews of PHP books and other products
> http://www.phpclasses.org/reviews/
>
> Metastorage - Data object relational mapping layer generator
> http://www.meta-language.net/metastorage.html
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Sybase ASE Linux Express Edition - download now for FREE
> LinuxWorld Reader's Choice Award Winner for best database on Linux.
> http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Sybase ASE Linux Express Edition - download now for FREE
> LinuxWorld Reader's Choice Award Winner for best database on Linux.
> http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
>



-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to