On 02/03/2012 12:45, Lewis John Mcgibbney wrote:
Hi Guys,

As there were some comments on the user list, I recently got digging
with http redirects then stumbled across NUTCH-1042. Although these are
individual issues e.g. redirects and crawl delays, I think they are
certainly linked, however what is interesting is that users 'usually'
don't consider them to be interlinked as such and therefore struggle to
debug how and why either the redirect or the crawl delay pages are not
being fetched.

Doing some more digging I found the now rather old and tatty NUTCH-475,
which obviously got me thinking about how we maintain the
AdaptiveFetchSchedule for custom refetching. Now I begin to start
thinking about the following

- Regardless of whether we implement an AdaptiveCrawlDelay, NUTCH-1042
still needs fixed as this is obviously becoming a bit of a pain for some
users.

Yes.

- Can someone shine some light on what happened to Fetcher2.java that
Dogacan refers to? I was only ever accustomed to OldFetcher and Fetcher :0)

Fetcher2 is the current Fetcher. The original Fetcher was temporarily renamed OldFetcher and then removed.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to