Yes, thanks. It is related. However, it does not apply to DB_GONE pages (only), but to all pages that have their fetchInterval > max interval.
Actually, I'm still a bit puzzled by the scheduling related parameters and the way the AbstractFetchSchedule handles them. Why do pages with a fetchInterval > maxInterval suddenly have to be fetched? I would say that if we encounter such pages, we correct the fetchInterval (set it to the maxInterval) and leave it there. Also, I would suggest that we only do this at DbUpdate time. Mathijs On Feb 28, 2012, at 14:41 , Markus Jelsma wrote: > https://issues.apache.org/jira/browse/NUTCH-578 > https://issues.apache.org/jira/browse/NUTCH-1245 > > Is you issue similar to these? > > On Tuesday 28 February 2012 14:09:25 Mathijs Homminga wrote: >> Hi, >> >> Does anyone know why the AbstractFetchSchedule.forceFetch method sets the >> page.status to STATUS_UNFETCHED? >> >> The DbUpdateReducer calls this method when the page.fetchInterval exceeds >> the (current) db.fetch.interval.max. As I understand it, we call this >> method to keep all fetchIntervals in the webtable within the current >> maximum, but why reset the page status? >> >> I bumped into this because my db.fetch.interval.default > >> db.fetch.interval.max ;)) After a couple of successful crawl cycles, all >> of my webpages still were STATUS_UNFETCHED. >> >> Cheers, >> Mathijs > > -- > Markus Jelsma - CTO - Openindex

