Re: Drawing an analogy between AdaptiveFetchSchedule and AdaptiveCrawlDelay

2012-03-02 Thread Andrzej Bialecki

On 02/03/2012 12:45, Lewis John Mcgibbney wrote:

Hi Guys,

As there were some comments on the user list, I recently got digging
with http redirects then stumbled across NUTCH-1042. Although these are
individual issues e.g. redirects and crawl delays, I think they are
certainly linked, however what is interesting is that users 'usually'
don't consider them to be interlinked as such and therefore struggle to
debug how and why either the redirect or the crawl delay pages are not
being fetched.

Doing some more digging I found the now rather old and tatty NUTCH-475,
which obviously got me thinking about how we maintain the
AdaptiveFetchSchedule for custom refetching. Now I begin to start
thinking about the following

- Regardless of whether we implement an AdaptiveCrawlDelay, NUTCH-1042
still needs fixed as this is obviously becoming a bit of a pain for some
users.


Yes.


- Can someone shine some light on what happened to Fetcher2.java that
Dogacan refers to? I was only ever accustomed to OldFetcher and Fetcher :0)


Fetcher2 is the current Fetcher. The original Fetcher was temporarily 
renamed OldFetcher and then removed.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Drawing an analogy between AdaptiveFetchSchedule and AdaptiveCrawlDelay

2012-03-02 Thread Lewis John Mcgibbney
Hi Andrzej,

On Fri, Mar 2, 2012 at 12:37 PM, Andrzej Bialecki a...@getopt.org wrote:

 Fetcher2 is the current Fetcher. The original Fetcher was temporarily
 renamed OldFetcher and then removed.


So looks like this 'might' be more straight forward to implement than I
originally thought. When I get a bit of time I would like to dive into it.

Thanks