Re: how to deal with large/slow sites

Doug Cutting Mon, 12 Sep 2005 10:23:36 -0700

AJ Chen wrote:

Two questions:
(1) Is there a better approach to deep-crawl large sites?

If a site with N pages which require T seconds each on average to fetch,then fetching the entire site will require N*T seconds. If that'slonger than you're willing to wait then you'll won't be able to fetchthe entire site. If you are willing to wait, then set http.max.delaysto Integer.MAX_VALUE and wait. In this case there's no shortcut.

(2) Will the dropped urls be picked up again in subsequent cycles offetchlist/segment/fetch/updatedb?


They will be retried in the next cycle, up to db.fetch.retry.max.

Doug

Re: how to deal with large/slow sites

Reply via email to