[ 
https://issues.apache.org/jira/browse/NUTCH-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625746#comment-16625746
 ] 

Sebastian Nagel commented on NUTCH-2623:
----------------------------------------

Agreed. Now as https has become the default protocol and http is usually 
redirected to https, the legacy mode does not make sense. If there are no 
objections, I'll remove the legacy mode. Thanks, [~markus17]!

> Fetcher to guarantee delay for same host/domain/ip independent of http/https 
> protocol
> -------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2623
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2623
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.14
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
>  Fetcher uses a combination of protocol and host/domain/ip as ID for fetch 
> item queues, see 
> [FetchItem.java|https://github.com/apache/nutch/blob/2b93a66/src/java/org/apache/nutch/fetcher/FetchItem.java#L101].
>  This inhibits a guaranteed delay, in case both http:// and https:// URLs are 
> fetched from the same host/domain/ip, e.g. here with a large delay of 30 sec.:
> {noformat}
> 2018-07-23 14:54:39,834 INFO fetcher.FetcherThread - FetcherThread 24 
> fetching http://nutch.apache.org/ (queue crawl delay=30000ms)
> 2018-07-23 14:54:39,846 INFO fetcher.FetcherThread - FetcherThread 23 
> fetching https://nutch.apache.org/ (queue crawl delay=30000ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to