[ 
https://issues.apache.org/jira/browse/NUTCH-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046500#comment-17046500
 ] 

Hudson commented on NUTCH-2767:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch-trunk #3664 (See 
[https://builds.apache.org/job/Nutch-trunk/3664/])
NUTCH-2767 Fetcher to stop filling queues skipped due to repeated (snagel: 
[https://github.com/apache/nutch/commit/8e5837f58f28f1b5bf9a714513dac926eddce3f8])
* (edit) src/java/org/apache/nutch/fetcher/QueueFeeder.java
* (edit) src/java/org/apache/nutch/fetcher/FetchItemQueues.java
* (edit) src/java/org/apache/nutch/fetcher/FetchItemQueue.java
NUTCH-2767 Fetcher to stop filling queues skipped due to repeated (snagel: 
[https://github.com/apache/nutch/commit/7840cb6fabe68564aa5742594c653490ffc2ca4e])
* (edit) src/java/org/apache/nutch/fetcher/QueueFeeder.java
NUTCH-2767 Fetcher to stop filling queues skipped due to repeated (snagel: 
[https://github.com/apache/nutch/commit/35dcd42d645e4433f4e4dfc44629a651c0267156])
* (edit) src/java/org/apache/nutch/fetcher/FetchItemQueues.java
NUTCH-2767 Fetcher to stop filling queues skipped due to repeated (snagel: 
[https://github.com/apache/nutch/commit/6dd0a7f69c2b19b7734726fae99daa435514091f])
* (edit) src/java/org/apache/nutch/fetcher/Fetcher.java
* (edit) src/java/org/apache/nutch/fetcher/FetchItemQueue.java
* (edit) src/java/org/apache/nutch/fetcher/QueueFeeder.java
* (edit) src/java/org/apache/nutch/fetcher/FetchItemQueues.java


> Fetcher to stop filling queues skipped due to repeated exceptions
> -----------------------------------------------------------------
>
>                 Key: NUTCH-2767
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2767
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.16
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.17
>
>
> Since NUTCH-769 the fetcher skips URLs from queues which already got more 
> exceptions than configured by "fetcher.max.exceptions.per.queue". Such queues 
> are emptied when the threshold is reached. However, the QueueFeeder may still 
> feeding queues and add again URLs to the queues which are already over the 
> exception threshold. The first URL in the queue is then fetched, consecutive 
> ones are eventually removed if the next exception is observed.
> Here one example:
> {noformat}
> 2020-02-19 06:26:48,877 INFO [FetcherThread] o.a.n.fetcher.FetchItemQueues: * 
> queue: ww.example.com >> removed 61 URLs from queue because 40 exceptions 
> occurred
> 2020-02-19 06:26:53,551 INFO [FetcherThread] o.a.n.fetcher.FetcherThread: 
> FetcherThread 172 fetching https://www.example.com/... (queue crawl 
> delay=5000ms)
> 2020-02-19 06:26:54,073 INFO [FetcherThread] o.a.n.fetcher.FetcherThread: 
> FetcherThread 172 fetch of https://www.example.com/... failed with: ...
> 2020-02-19 06:26:58,766 INFO [FetcherThread] o.a.n.fetcher.FetcherThread: 
> FetcherThread 111 fetching https://www.example.com/... (queue crawl 
> delay=5000ms)
> 2020-02-19 06:26:59,290 INFO [FetcherThread] o.a.n.fetcher.FetcherThread: 
> FetcherThread 111 fetch of https://www.example.com/... failed with: ...
> 2020-02-19 06:27:03,960 INFO [FetcherThread] o.a.n.fetcher.FetcherThread: 
> FetcherThread 103 fetching https://www.example.com/... (queue crawl 
> delay=5000ms)
> 2020-02-19 06:27:04,482 INFO [FetcherThread] o.a.n.fetcher.FetcherThread: 
> FetcherThread 103 fetch of https://www.example.com/... failed with: ...
> 2020-02-19 06:27:04,484 INFO [FetcherThread] o.a.n.fetcher.FetchItemQueues: * 
> queue: www.example.com >> removed 1 URLs from queue because 41 exceptions 
> occurred
> ... (fetching again 30 URLs, all failed)
> 2020-02-19 06:28:23,578 INFO [FetcherThread] 
> org.apache.nutch.fetcher.FetchItemQueues: * queue: www.example.com >> removed 
> 1 URLs from queue because 42 exceptions occurred
> {noformat}
> QueueFeeder should check whether the exception threshold is already reached 
> and if yes not add further URLs to the queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to