ASF GitHub Bot commented on NUTCH-2550:

HansBrende opened a new pull request #309: fix for NUTCH-2550 contributed by 
Hans Brende
URL: https://github.com/apache/nutch/pull/309
   This simple patch should do the trick. Tested locally and everything works 
as expected again.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Redirects are broken
> --------------------
>                 Key: NUTCH-2550
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2550
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.15
>            Reporter: Hans Brende
>            Priority: Blocker
>             Fix For: 1.15
> As I detailed in this github 
> [comment|https://github.com/apache/nutch/commit/c93d908bb635d3c5b59f8c8a22e0584ebf588794#r28470348],
>  it appears that PR #221 broke redirects. The fetcher will repeatedly fetch 
> the *original url* rather than the one it's supposed to be redirecting to 
> until {{http.redirect.max}} is exceeded, and then end with 
> I noticed this issue when I was trying to crawl a site with a 301 MOVED 
> PERMANENTLY status code.
> Should be pretty easy to fix though: I was able to get redirects working 
> again simply by inserting the code {code:java}url = fit.url{code} 
> [here|https://github.com/apache/nutch/blob/8682b96c3b84018f187eabaadc096ceded34f250/src/java/org/apache/nutch/fetcher/FetcherThread.java#L388]
>  and 
> [here|https://github.com/apache/nutch/blob/8682b96c3b84018f187eabaadc096ceded34f250/src/java/org/apache/nutch/fetcher/FetcherThread.java#L409].

This message was sent by Atlassian JIRA

Reply via email to