Lewis John McGibbney created NUTCH-1941:
-------------------------------------------

             Summary: Optional rolling http.agent.name's
                 Key: NUTCH-1941
                 URL: https://issues.apache.org/jira/browse/NUTCH-1941
             Project: Nutch
          Issue Type: Bug
          Components: fetcher, protocol
            Reporter: Lewis John McGibbney
            Priority: Trivial


In some scenarios, even whilst adhering to fetcher.crawl.delay, web admins can 
block your fetcher based merely on your crawler name. 
I propose the ability to implement rolling http.agent.name's which could be 
substituted every 5 seconds for example. This would mean that successive 
requests to the same domain would be sent with different http.agent.name. 
This behavior should be off by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to