[jira] [Commented] (NUTCH-207) Bandwidth target for fetcher rather than a thread count

Julien Nioche (JIRA) Thu, 17 Apr 2014 07:35:53 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972984#comment-13972984
 ]


Julien Nioche commented on NUTCH-207:
-------------------------------------

Am starting to think that the cleanest way to implement this would be to make 
some radical changes to the way the Fetcher works and use the Executor 
framework. The ThreadPoolExecutor is quite a nice fit for that as it defines a 
max number of threads to use but would require changing the logic in the 
Fetcher and get the queues to push the tasks to the Executor instead of having 
the FetcherThreads polling them for work. Will probably open a new issue for 
this. 

> Bandwidth target for fetcher rather than a thread count
> -------------------------------------------------------
>
>                 Key: NUTCH-207
>                 URL: https://issues.apache.org/jira/browse/NUTCH-207
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8
>            Reporter: Rod Taylor
>            Assignee: Julien Nioche
>             Fix For: 1.9
>
>         Attachments: ratelimit.patch
>
>
> Increases or decreases the number of threads from the starting value 
> (fetcher.threads.fetch) up to a maximum (fetcher.threads.maximum) to achieve 
> a target bandwidth (fetcher.threads.bandwidth).
> It seems to be able to keep within 10% of the target bandwidth even when 
> large numbers of errors are found or when a number of large pages is run 
> across.
> To achieve more accurate tracking Nutch should keep track of protocol 
> overhead as well as the volume of pages downloaded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (NUTCH-207) Bandwidth target for fetcher rather than a thread count

Reply via email to