[
https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833927#comment-13833927
]
Otis Gospodnetic commented on NUTCH-1297:
-----------------------------------------
bq. I think it was long as in 'has many URLs in it', not necessarily as in
'takes a lot of time'.
Right, that's what I meant, too. A queue could be longer than others because
it's "draining" (much) more slowly than other queues *because* fetching pages
from that queue's host is going (much) slower than fetching from other hosts.
If you then give this queue priority over other queues that may be smaller
because their hosts are faster, would that lead to faster fetch throughput in
the end? Or faster fetch phase?
> it is better for fetchItemQueues to select items from greater queues first
> --------------------------------------------------------------------------
>
> Key: NUTCH-1297
> URL: https://issues.apache.org/jira/browse/NUTCH-1297
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.4
> Reporter: behnam nikbakht
> Labels: fetch_queues
> Fix For: 1.8
>
> Attachments: NUTCH-1297.patch
>
>
> there is a situation that if we have multiple hosts in fetch, and size of
> hosts were different, large hosts have a long delay until the getFetchItem()
> in FetchItemQueues class select a url from them, so we can give them more
> priority.
> for example if we have 10 url from host1 and 1000 url from host2, and have 5
> threads, if all threads first selected from host1, we had more delay on fetch
> rather than a situation that threads first selected from host2, and when host
> 2 was busy, then selected from host1.
--
This message was sent by Atlassian JIRA
(v6.1#6144)