Daniele Menozzi wrote:
ok, so the depth value is only used to stop the crawling at a certain
point, and proceed with the indexing, right?
Yes - depth means in fact - number of interations of
generate/fetch/update cycle.
But, another thing: how can I refresh old pages? What class do I have to
use?
nutch generate - will include already fetched pages in new segment for
fetching after some time (I think default is 30 days and you can change
it in config file). And if you deduplicate segments the old page would
be removed from index.
regards
Piotr