[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nguyen Manh Tien updated NUTCH-1687: ------------------------------------ Component/s: fetcher > Pick queue in Round Robin > ------------------------- > > Key: NUTCH-1687 > URL: https://issues.apache.org/jira/browse/NUTCH-1687 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 2.3 > Reporter: Nguyen Manh Tien > Priority: Minor > Attachments: NUTCH-1687.patch > > > Currently we chose queue to pick url from start of queues list, so queue at > the start of list have more change to be pick first, that can cause problem > of long tail queue, which only few queue available at the end which have many > urls. > public synchronized FetchItem getFetchItem() { > final Iterator<Map.Entry<String, FetchItemQueue>> it = > queues.entrySet().iterator(); ==> always reset to find queue from > start > while (it.hasNext()) { > .... > I think it is better to pick queue in round robin, that can make reduce time > to find the available queue and make all queue was picked in round robin and > if we use TopN during generator there are no long tail queue at the end. -- This message was sent by Atlassian JIRA (v6.1.5#6160)