[ 
https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347735#comment-16347735
 ] 

Markus Jelsma commented on NUTCH-2466:
--------------------------------------

Hello Moreno,

Well, we obviously could allow a -1 setting and treat that as forever, but 
forever is infinite and it would hang the Nutch task until Hadoop treats it as 
timed out, usually within ten minutes.

The setting is an int, so if you want, you can set it to the maximum positive 
integer and handle just over two billion consecutive redirects. Y

I believe that would justify the meaning of forever in this context, do you 
agree?

As a side note, having dealt with the crudeness of the www for many years, i 
consider any sequence of more than four redirects as the root a whole other 
problem. Our (company, not asf nutch) maximum setting is always three, higher 
than that has, so far, always lead to circular redirects.


> Sitemap processor to follow redirects
> -------------------------------------
>
>                 Key: NUTCH-2466
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2466
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.15
>
>         Attachments: NUTCH-2466.patch, NUTCH-2466.patch, NUTCH-2466.patch
>
>
> It does follow http > https, but not the following redirect, e.g. 
> sitemap_index.xml that some websites have.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to