[ 
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated NUTCH-1687:
-------------------------------

    Attachment: NUTCH-1687.tejasp.v1.patch

I feel that there is no need for creating a separate class for Circular linked 
list and maintaining the circular list along with the original map. 

Uploading "NUTCH-1687.tejasp.v1.patch" : Uses 
[LinkedHashMap|http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html]
 along with a [Guava cyclic 
iterator|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterables.html#cycle(java.lang.Iterable)]
 to iterate the map of queues in a circular fashion. With that no separate list 
needs to be maintained. 

Comments are welcome.

> Pick queue in Round Robin
> -------------------------
>
>                 Key: NUTCH-1687
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1687
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>            Reporter: Tien Nguyen Manh
>            Priority: Minor
>             Fix For: 2.3, 1.8
>
>         Attachments: NUTCH-1687.patch, NUTCH-1687.tejasp.v1.patch
>
>
> Currently we chose queue to pick url from start of queues list, so queue at 
> the start of list have more change to be pick first, that can cause problem 
> of long tail queue, which only few queue available at the end which have many 
> urls.
> public synchronized FetchItem getFetchItem() {
>       final Iterator<Map.Entry<String, FetchItemQueue>> it =
>         queues.entrySet().iterator(); ==> always reset to find queue from 
> start
>       while (it.hasNext()) {
> ....
> I think it is better to pick queue in round robin, that can make reduce time 
> to find the available queue and make all queue was picked in round robin and 
> if we use TopN during generator there are no long tail queue at the end.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to