Sebastian Nagel created NUTCH-2947:
--------------------------------------

             Summary: Fetcher: keep state of empty fetch queues unless queue 
feeder is finished
                 Key: NUTCH-2947
                 URL: https://issues.apache.org/jira/browse/NUTCH-2947
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.18
            Reporter: Sebastian Nagel
            Assignee: Sebastian Nagel
             Fix For: 1.19


If a fetch queue is empty (containing no fetch items) it may be removed from 
the list of queues. This also remove the state of a fetch queue, namely the 
next fetch time and the exception counter. If the queue feeder is still active 
it may happened that the same queue (i.e. associated with the same 
host/domain/IP) removed before is created again. In this case, certain aspects 
of fetcher politeness cannot be guaranteed anymore:
- the fetch delay (via earliest next fetch time) and
- the mechanism to block fetching from the same host/domain/IP with too many 
exceptions (NUTCH-769).

The issue was observed while verifying NUTCH-2946 in the fetcher logs:
{noformat}
... 10:19:16,912 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 10:20:16,250 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
exceptions in queue
... 10:21:52,675 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 10:25:40,931 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 10:27:45,066 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
exceptions in queue
... 10:29:40,407 * queue foo.bar >> delayed next fetch by 100000 ms after 3 
exceptions in queue
... 10:41:48,870 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 10:47:54,946 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 10:52:46,792 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 10:57:43,470 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:01:12,220 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:04:24,621 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:18:40,398 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:21:09,437 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:34:36,052 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:39:17,898 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:40:35,472 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:50:34,224 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:51:27,547 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:53:04,783 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 11:54:04,404 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
exceptions in queue
... 11:55:38,232 * queue foo.bar >> delayed next fetch by 100000 ms after 3 
exceptions in queue
... 11:57:37,942 * queue foo.bar >> delayed next fetch by 116096 ms after 4 
exceptions in queue
... 12:01:08,619 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
... 12:02:35,985 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
exceptions in queue
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to