[ 
https://issues.apache.org/jira/browse/NUTCH-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531240#comment-17531240
 ] 

ASF GitHub Bot commented on NUTCH-2947:
---------------------------------------

sebastian-nagel opened a new pull request, #729:
URL: https://github.com/apache/nutch/pull/729

   Fetcher: keep state of empty but stateful fetch queues unless queue feeder 
is finished in order to ensure politeness
   - next fetch time not yet reached
   - non-zero exception counter and queue feeder still
     adding new fetch items to queues
   
   Only if the the queue feeder is finished and no more new fetch items are 
added, these queues can finally removed.
   
   Note: this PR needs to be adapted to #728 (NUTCH-2946) or vice verse 
whichever is merged first. The state of queues needs also preserved in case 
fetcher.max.exceptions.per.queue == -1 but fetcher.exceptions.per.queue.delay 
!= -1.




> Fetcher: keep state of empty fetch queues unless queue feeder is finished
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-2947
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2947
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.19
>
>
> If a fetch queue is empty (containing no fetch items) it may be removed from 
> the list of queues. This also remove the state of a fetch queue, namely the 
> next fetch time and the exception counter. If the queue feeder is still 
> active it may happened that the same queue (i.e. associated with the same 
> host/domain/IP) removed before is created again. In this case, certain 
> aspects of fetcher politeness cannot be guaranteed anymore:
> - the fetch delay (via earliest next fetch time) and
> - the mechanism to block fetching from the same host/domain/IP with too many 
> exceptions (NUTCH-769).
> The issue was observed while verifying NUTCH-2946 in the fetcher logs:
> {noformat}
> ... 10:19:16,912 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:20:16,250 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
> exceptions in queue
> ... 10:21:52,675 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:25:40,931 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:27:45,066 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
> exceptions in queue
> ... 10:29:40,407 * queue foo.bar >> delayed next fetch by 100000 ms after 3 
> exceptions in queue
> ... 10:41:48,870 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:47:54,946 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:52:46,792 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:57:43,470 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:01:12,220 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:04:24,621 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:18:40,398 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:21:09,437 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:34:36,052 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:39:17,898 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:40:35,472 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:50:34,224 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:51:27,547 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:53:04,783 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:54:04,404 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
> exceptions in queue
> ... 11:55:38,232 * queue foo.bar >> delayed next fetch by 100000 ms after 3 
> exceptions in queue
> ... 11:57:37,942 * queue foo.bar >> delayed next fetch by 116096 ms after 4 
> exceptions in queue
> ... 12:01:08,619 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 12:02:35,985 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to