-----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 18, 2006 8:02 PM To: [email protected] Subject: Re: question about crawldb Importance: High
Anton Potehin wrote: > 1. We have found these flags in CrawlDatum class: > > public static final byte STATUS_SIGNATURE = 0; > public static final byte STATUS_DB_UNFETCHED = 1; > public static final byte STATUS_DB_FETCHED = 2; > public static final byte STATUS_DB_GONE = 3; > public static final byte STATUS_LINKED = 4; > public static final byte STATUS_FETCH_SUCCESS = 5; > public static final byte STATUS_FETCH_RETRY = 6; > public static final byte STATUS_FETCH_GONE = 7; > > Though the names of these flags describe their aims, it is not clear > completely what they mean and what is the difference between > STATUS_DB_FETCHED and STATUS_FETCH_SUCCESS for example. The STATUS_DB_* codes are used in entries in the crawldb. STATUS_FETCH_* codes are used in fetcher output. STATUS_LINKED is used in parser output for urls that are linked to. A crawldb update combines all of these (the old version of the db, plus fetcher and parser output) to generate a new version of the db, containing only STATUS_DB_* entries. This logic is in CrawlDbReducer. Does that help? Yes ;-) tnx...
