On Fri, 3 Oct 2003, Lachlan Andrew wrote: > Greetings Neal, > > I'm not sure that I understand this. If a page 'X' is linked only by > a page 'Y' which isn't changed since the previous dig, do we parse > the unchanged page 'Y'? If so, why not run htdig -i? If not, how > do we know that page 'X' should still be in the database?
X does not change, but Y does.. it no longer has a link to X. If the website is big enough htdig -i is wastefull of network bandwidth. The locical error as I see it is that we revisit the list of documents currently in the index, rather than starting from the beginning and spidering... then removing the all documents we didn't find links for. > I'd be inclined not to fix this until after we've released the next > "archive point", whether that be 3.2.0b5 or 3.2.0rc1... > Cheers, > Lachlan > > On Fri, 3 Oct 2003 08:56, Neal Richter wrote: > > The workaround is to use 'htdig -i'. This is a disadvantage as we > > will revisit and index pages even if they haven't changes since the > > last run of htdig. > > > > Here's the Fix: > > > > 1) At the start of Htdig, after we've opened the DBs we 'walk' the > > docDB and mark EVERY document as Reference_obsolete. I wrote code > > to do this.. very short. > > -- > [EMAIL PROTECTED] > ht://Dig developer DownUnder (http://www.htdig.org) > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev