Hi Cesare,

hmhh... Good catch!

The modifiedTime is also set in CrawlDbReducer.reduce
right after FetchSchedule.setFetchSchedule is called and the signature
hasn't changed compared to the previous fetch, cf. NUTCH-1341.

At a first glance, it looks like the modifiedTime is indeed never set
with DefaultFetchSchedule.
I'll have a more detailed look at this and come back soon.

Thanks,
Sebastian

On 11/15/2012 12:33 PM, Cesare Zavattari wrote:
> Hi all,
> the AdaptiveFetchSchedure has the following line:
> 
> if (modifiedTime <= 0) modifiedTime = fetchTime;
> 
> that DefaultFetchSchedule has not. This seems to
> prevent DefaultFetchSchedule handle correctly possible 403 responses
> (modifiedTime seems to be always zero and HttpRequest.java doesn't
> set If-Modified-Since request part).
> 
> This is true for both nutch 1.x and 2.x.
> 
> Is this the expected behaviour?
> 
> Thanks
> Bye
> 

Reply via email to