Nguyen Manh Tien created NUTCH-1687:
---------------------------------------

             Summary: Pick queue in Round Robin
                 Key: NUTCH-1687
                 URL: https://issues.apache.org/jira/browse/NUTCH-1687
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 2.3
            Reporter: Nguyen Manh Tien
            Priority: Minor


Currently we chose queue to pick url from start of queues list, so queue at the 
start of list have more change to be pick first, that can cause problem of long 
tail queue, which only few queue available at the end which have many urls.

public synchronized FetchItem getFetchItem() {
      final Iterator<Map.Entry<String, FetchItemQueue>> it =
        queues.entrySet().iterator(); ==> always reset to find queue from start
      while (it.hasNext()) {
....

I think it is better to pick queue in round robin, that can make reduce time to 
find the available queue and make all queue was picked in round robin and if we 
use TopN during generator there are no long tail queue at the end.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to