[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364765#comment-14364765 ]
Lewis John McGibbney commented on NUTCH-1941: --------------------------------------------- Hi [~asitang] bq. -clutter of classpath: I used eclipse to create a patch. Thank you for creating patch, however please look at your proposed addition. It does not make sense! bq. -strange logging: just was giving some strange outputs so that I could find the changes in the Log files. please remove these entirely if you would ever like you patch to be included into commercial grade software. It is great for test and development... it is utterly terrible for the consumption of the Nutch community. bq. -unknown files/classes: I was working on nutch 1x and not 2x for this issue. That is OK, you are absolutely fine bq. -"no actual implementation which would all us to define a list of agent names from which to rotate." : because I was using a counter for that. thank you! > Optional rolling http.agent.name's > ---------------------------------- > > Key: NUTCH-1941 > URL: https://issues.apache.org/jira/browse/NUTCH-1941 > Project: Nutch > Issue Type: New Feature > Components: fetcher, protocol > Reporter: Lewis John McGibbney > Priority: Trivial > Attachments: NUTCH-1941-ver1.patch, nutch.patch > > > In some scenarios, even whilst adhering to fetcher.crawl.delay, web admins > can block your fetcher based merely on your crawler name. > I propose the ability to implement rolling http.agent.name's which could be > substituted every 5 seconds for example. This would mean that successive > requests to the same domain would be sent with different http.agent.name. > This behavior should be off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)