Crawling

vivek Mon, 04 Mar 2013 08:07:20 -0800

Hi,

I am Vivek and i am working on nutch.I have a few doubts regarding Crawling:


1)when a page is fetched and after 30 days  it is due for
fetching(default),but what happens if a page does not exist now on the
web.Is it removed from the crawldb and segments or it still remains  there.
In case the page is stale and I want to remove it from my crawled data how
can I do that??


2)How to refresh a crawl? .I mean suppose I have crawled 100000 urls and
after 5th depth I want that the fetching should be done again from the
beginning without stopping the process i.e continous crawl from depth 1 to
5 continuously without stopping and this too in a cyclic process


-- 







Thanks and Regards,

VIVEK KOUL

Crawling

Reply via email to