[ https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653368#action_12653368 ]
Todd Lipcon commented on NUTCH-207: ----------------------------------- Any word on this JIRA? This would be a very useful feature for me - we are bandwidth constrained in the sense that we could easily pull a couple hundred mbits but don't want to go over our 95th percentile commit. I imagine others are in a similar situation. Tweaking the number of fetchers gets us in the ballpark, but a feature like this would be far superior (since crawls often start off pulling higher than our commit and then slow to 60% of our commit later on) If it's an issue of porting the patch against the current code I can take that on. > Bandwidth target for fetcher rather than a thread count > ------------------------------------------------------- > > Key: NUTCH-207 > URL: https://issues.apache.org/jira/browse/NUTCH-207 > Project: Nutch > Issue Type: New Feature > Components: fetcher > Affects Versions: 0.8 > Reporter: Rod Taylor > Attachments: ratelimit.patch > > > Increases or decreases the number of threads from the starting value > (fetcher.threads.fetch) up to a maximum (fetcher.threads.maximum) to achieve > a target bandwidth (fetcher.threads.bandwidth). > It seems to be able to keep within 10% of the target bandwidth even when > large numbers of errors are found or when a number of large pages is run > across. > To achieve more accurate tracking Nutch should keep track of protocol > overhead as well as the volume of pages downloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.