On Fri, 3 Oct 2003, Lachlan Andrew wrote:

> Greetings Neal,
>
> I'm not sure that I understand this.  If a page 'X' is linked only by
> a page 'Y' which isn't changed since the previous dig, do we parse
> the unchanged page 'Y'?  If so, why not run  htdig -i?  If not, how
> do we know that page 'X' should still be in the database?

X does not change, but Y does.. it no longer has a link to X.

If the website is big enough htdig -i is wastefull of network bandwidth.

The locical error as I see it is that we revisit the list of documents
currently in the index, rather than starting from the beginning and
spidering... then removing the all documents we didn't find links for.

> I'd be inclined not to fix this until after we've released the next
> "archive point", whether that be 3.2.0b5 or 3.2.0rc1...
> Cheers,
> Lachlan
>
> On Fri, 3 Oct 2003 08:56, Neal Richter wrote:
> > The workaround is to use 'htdig -i'.  This is a disadvantage as we
> > will revisit and index pages even if they haven't changes since the
> > last run of htdig.
> >
> > Here's the Fix:
> >
> > 1) At the start of Htdig, after we've opened the DBs we 'walk' the
> > docDB and mark EVERY document as Reference_obsolete.  I wrote code
> > to do this.. very short.
>
> --
> [EMAIL PROTECTED]
> ht://Dig developer DownUnder  (http://www.htdig.org)
>

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to