[
https://issues.apache.org/jira/browse/NUTCH-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433200#comment-16433200
]
Hudson commented on NUTCH-2550:
-------------------------------
SUCCESS: Integrated in Jenkins build Nutch-trunk #3516 (See
[https://builds.apache.org/job/Nutch-trunk/3516/])
fix for NUTCH-2550 contributed by Hans Brende (hans:
[https://github.com/apache/nutch/commit/de190286a43bba7f4b27fd5a84a252bbcafa67c2])
* (edit) src/java/org/apache/nutch/fetcher/FetcherThread.java
> Fetcher fails to follow redirects
> ---------------------------------
>
> Key: NUTCH-2550
> URL: https://issues.apache.org/jira/browse/NUTCH-2550
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.15
> Reporter: Hans Brende
> Priority: Blocker
> Fix For: 1.15
>
>
> As I detailed in this github
> [comment|https://github.com/apache/nutch/commit/c93d908bb635d3c5b59f8c8a22e0584ebf588794#r28470348],
> it appears that PR #221 broke redirects. The fetcher will repeatedly fetch
> the *original url* rather than the one it's supposed to be redirecting to
> until {{http.redirect.max}} is exceeded, and then end with
> {{STATUS_FETCH_GONE}}.
> I noticed this issue when I was trying to crawl a site with a 301 MOVED
> PERMANENTLY status code.
> Should be pretty easy to fix though: I was able to get redirects working
> again simply by inserting the code {code:java}url = fit.url{code}
> [here|https://github.com/apache/nutch/blob/8682b96c3b84018f187eabaadc096ceded34f250/src/java/org/apache/nutch/fetcher/FetcherThread.java#L388]
> and
> [here|https://github.com/apache/nutch/blob/8682b96c3b84018f187eabaadc096ceded34f250/src/java/org/apache/nutch/fetcher/FetcherThread.java#L409].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)