[ 
https://issues.apache.org/jira/browse/NUTCH-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610487#comment-14610487
 ] 

Sebastian Nagel commented on NUTCH-2055:
----------------------------------------

Hi Talat,
* do you really want a random but fixed crawl delay for every host? Isn't it 
about randomizing the intervals between accessing the same host? For the latter 
case nextFetchTime in FetchItemQueue needs to be set to a random value after 
each fetch/access, probably from FetchItemQueue.setEndTime().
* shouldn't the random delay be chosen between "fetcher.server.delay" and 
"fetcher.max.crawl.delay"? Just to guarantee a certain minimum delay. In case 
multiple FetcherThreads are accessing the same host 
("fetcher.threads.per.queue" > 1), the minimum is consequently 
"fetcher.server.min.delay".


> Random Crawl Delay
> ------------------
>
>                 Key: NUTCH-2055
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2055
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 2.3
>            Reporter: Talat UYARER
>            Priority: Trivial
>             Fix For: 2.4
>
>         Attachments: NUTCH-2055.patch
>
>
> Some Firewalls can block that request with same delay time. I create a patch 
> for random crawl delay between 0 and max Crawl Delay settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to