Yossi Tamari created NUTCH-2509:

             Summary: Inconsistent behavior in SitemapProcessor
                 Key: NUTCH-2509
                 URL: https://issues.apache.org/jira/browse/NUTCH-2509
             Project: Nutch
          Issue Type: Bug
          Components: sitemap
    Affects Versions: 1.14
            Reporter: Yossi Tamari

There are two inconsistent behaviors in SitemapProcessor:
 # There is a member variable maxRedir that is supposed to limit the number of 
redirections on sitemap URLs, and it is initialized from config property 
sitemap.redir.max, but it is ignored in the code because a local variable with 
the same name is defined in the relevant method, and is always set to 3.
 # When a sitemap URL goes through redirect, it is filtered and normalized. 
However, if a sitemap URL comes from a sitemapindex, it is not. This seems 
inconsistent, as in both cases we have a URL from an outside source.

This message was sent by Atlassian JIRA

Reply via email to