Hi, I am Vivek and i am working on nutch.I have a few doubts regarding Crawling:
1)when a page is fetched and after 30 days it is due for fetching(default),but what happens if a page does not exist now on the web.Is it removed from the crawldb and segments or it still remains there. In case the page is stale and I want to remove it from my crawled data how can I do that?? 2)How to refresh a crawl? .I mean suppose I have crawled 100000 urls and after 5th depth I want that the fetching should be done again from the beginning without stopping the process i.e continous crawl from depth 1 to 5 continuously without stopping and this too in a cyclic process -- Thanks and Regards, VIVEK KOUL

