[ 
https://issues.apache.org/jira/browse/NUTCH-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18008455#comment-18008455
 ] 

Sebastian Nagel commented on NUTCH-3120:
----------------------------------------

Hi [~markus17],

thanks! Yes slowing down on a HTTP 429 is necessary (or highly recommended) and 
this was already implemented in master:
 
- HTTP 429 is mapped to {{ProtocolStatus.EXCEPTION}} and triggers the 
exponential backoff (NUTCH-2946). See also
 - 
[FetchItemQueues|https://github.com/apache/nutch/blob/2786b5a9baefc4803964a4c5197cfbfe32f8988e/src/java/org/apache/nutch/fetcher/FetchItemQueues.java#L334]
 for the implementation
 - configuration of the backoff in 
[nutch-default.xml|https://github.com/apache/nutch/blob/2786b5a9baefc4803964a4c5197cfbfe32f8988e/conf/nutch-default.xml#L1171]
 - NUTCH-2573 if a HTTP 429 is already seen when fetching the robots.txt
 - NUTCH-3067, NUTCH-3072, NUTCH-2992, NUTCH-2947 and NUTCH-2767 which fix bugs 
and improve the performance when handling throttled queues, especially if there 
are many of them
 - NUTCH-3114 to avoid that the fetching becomes stale when only URLs from 
throttled queues have left


> Automatically increase crawl-delay on HTTP 429
> ----------------------------------------------
>
>                 Key: NUTCH-3120
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3120
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Major
>             Fix For: 1.22
>
>         Attachments: NUTCH-3120-1.15.patch
>
>
> Thought i remember a discussion or ticket on this subject, but it seems no 
> code of this sort is in master at the moment.
> Anyway, small patch that adds HTTP429 to ProtocolStatus, the setter for that 
> status to HttpBase, and the reading of that status in FetcherThread, so it 
> can adjust the fetch speed (hardcoded to *3 right now).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to