Hi all

I am relatively new to nutch and I am trying to understand how it crawls
websites, but more specifically, how it creates and prioritises its Fetch
List. So I have a couple of questions I would like to ask:

   1. Which are Nutch crawl URL sources? I think they are both WebDB and
   segments but I am not sure.
   2. How does nutch prioritise crawling? By content expiration date only?
   3. Is there some way affect the way nutch orders URLs to be fetched? I've
   been reading the Generator class but haven't found an extension point for
   this.

Thanks in advance...

Rodrigo

Reply via email to