[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859275#comment-13859275
]
Tejas Patil commented on NUTCH-1687:
------------------------------------
This is one good point by [~tiennm]. Although this might not give significant
performance improvement, it would fairly distribute requests across all fetch
queues.
Some comments wrt the patch:
1. Do you really need to make the methods of CircularLinkedList class thread
safe ? The methods in "FetchItemQueues" which interact with the
CircularLinkedList (ie. getFetchItemQueue and getFetchItem) are all
synchronized. So, its ensured that only one thread accesses the list at a time.
2. Why is 'id' needed in FetchItemQueue ?
> Pick queue in Round Robin
> -------------------------
>
> Key: NUTCH-1687
> URL: https://issues.apache.org/jira/browse/NUTCH-1687
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Tien Nguyen Manh
> Priority: Minor
> Fix For: 2.3, 1.8
>
> Attachments: NUTCH-1687.patch
>
>
> Currently we chose queue to pick url from start of queues list, so queue at
> the start of list have more change to be pick first, that can cause problem
> of long tail queue, which only few queue available at the end which have many
> urls.
> public synchronized FetchItem getFetchItem() {
> final Iterator<Map.Entry<String, FetchItemQueue>> it =
> queues.entrySet().iterator(); ==> always reset to find queue from
> start
> while (it.hasNext()) {
> ....
> I think it is better to pick queue in round robin, that can make reduce time
> to find the available queue and make all queue was picked in round robin and
> if we use TopN during generator there are no long tail queue at the end.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)