[ 
https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256485#comment-13256485
 ] 

Ferdy Galema commented on NUTCH-1297:
-------------------------------------

@Julien
Are you sure that property addresses the issue described by Behnam? It seems 
this is about giving priority to queues that have more items in them. For 
example when all queues are eligable for fetching, but there are less fetcher 
thread than queues, the best strategy is to first pick items from the biggest 
queues. It is a way to reduce a possible longtail.
                
> it is better for fetchItemQueues to select items from greater queues first
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1297
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1297
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.4
>            Reporter: behnam nikbakht
>         Attachments: NUTCH-1297.patch
>
>
> there is a situation that if we have multiple hosts in fetch, and size of 
> hosts were different, large hosts have a long delay until the getFetchItem() 
> in FetchItemQueues class select a url from them, so we can give them more 
> priority.
> for example if we have 10 url from host1 and 1000 url from host2, and have 5 
> threads, if all threads first selected from host1, we had more delay on fetch 
> rather than a situation that threads first selected from host2, and when host 
> 2 was busy, then selected from host1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to