Thank you Tom,
What the relevant config XML variables control it?

Thank you in advance,
Vladimir.


-----Original Message-----
From: Tom Chiverton [mailto:[email protected]] 
Sent: November-25-16 2:31 AM
To: [email protected]
Subject: Re: Nutch 2.3.1 re-crawls unchanged web pages

I understand it's expected. Especially if the page is in the list of seeds. 

You can control this by changing the relevant config XML variables. 

On 24 November 2016 20:10:02 GMT+00:00, Vladimir Loubenski 
<[email protected]> wrote:
>Hi ,
>I am using Nutch 2.3.1.
>I run in loop generate, fetch, parse, updateDB steps. 
>I noted that during re-crawl even if a  web page doesn't change nutch 
>doesn't detect it  by value of  ETag, Last-Modified or signature fields 
>and continue process all these steps for unchanged web pages.
> Is it expected behaviour?
>Are there plans to fix it in future releases?  
>
>Regards,
>Vladimir.
>

Reply via email to