[ 
https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221908#comment-13221908
 ] 

Julien Nioche commented on NUTCH-1297:
--------------------------------------

This can already be addressed by giving a larger value to this parameter 

{noformat} 
<property>
  <name>fetcher.queue.depth.multiplier</name>
  <value>50</value>
  <description>(EXPERT)The fetcher buffers the incoming URLs into queues based 
on the [host|domain|IP]
  (see param fetcher.queue.mode). The depth of the queue is the number of 
threads times the value of this parameter.
  A large value requires more memory but can improve the performance of the 
fetch when the order of the URLS in the fetch list
  is not optimal.
  </description>
</property>
{noformat} 



                
> it is better for fetchItemQueues to select items from greater queues first
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1297
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1297
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.4
>            Reporter: behnam nikbakht
>         Attachments: NUTCH-1297.patch
>
>
> there is a situation that if we have multiple hosts in fetch, and size of 
> hosts were different, large hosts have a long delay until the getFetchItem() 
> in FetchItemQueues class select a url from them, so we can give them more 
> priority.
> for example if we have 10 url from host1 and 1000 url from host2, and have 5 
> threads, if all threads first selected from host1, we had more delay on fetch 
> rather than a situation that threads first selected from host2, and when host 
> 2 was busy, then selected from host1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to