Thank you Tom, What the relevant config XML variables control it? Thank you in advance, Vladimir.
-----Original Message----- From: Tom Chiverton [mailto:[email protected]] Sent: November-25-16 2:31 AM To: [email protected] Subject: Re: Nutch 2.3.1 re-crawls unchanged web pages I understand it's expected. Especially if the page is in the list of seeds. You can control this by changing the relevant config XML variables. On 24 November 2016 20:10:02 GMT+00:00, Vladimir Loubenski <[email protected]> wrote: >Hi , >I am using Nutch 2.3.1. >I run in loop generate, fetch, parse, updateDB steps. >I noted that during re-crawl even if a web page doesn't change nutch >doesn't detect it by value of ETag, Last-Modified or signature fields >and continue process all these steps for unchanged web pages. > Is it expected behaviour? >Are there plans to fix it in future releases? > >Regards, >Vladimir. >

