hi there, I found an interesting field in Nutch index directory, called "digest". Seems it is a hashed signature for a fetched page content. Is that true?
I verified my guess by checking same page in two different crawling round. The value of this field are the same for both segments. Essentially, I plan to check the updating status for a page I crawling. If there is no change (means no updating yet), I won't index this page to my search engine. To achieve this function, I will compare the "digest" fields of two pages with same URL. Is it the right approach? Does Nutch provide an API call to check the updating status for a particular web page? thanks, Michael, __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250