GitHub user sebastian-nagel opened a pull request:

    https://github.com/apache/nutch/pull/108

    NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' in crawl db / last…

    …Modified not always set
    
     - set modified time (time of last successful fetch) by 
DefaultFetchSchedule and AdaptiveFetchSchedule
       but only if the document is actually modified
     - update unit tests to check whether modification time is properly set
     - set modified time (sent by responding server in HTTP header) in 
ProtocolOutput:
       FetchSchedule implementations can access the HTTP modified time from 
CrawlDatum's
       metadata (PROTO_STATUS_KEY = "_pst_")

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sebastian-nagel/nutch NUTCH-2164

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #108
    
----
commit b0c2969e47a3129a0abd0f98b736616ebaf5b540
Author: Sebastian Nagel <[email protected]>
Date:   2016-03-11T21:55:24Z

    NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' in crawl db / 
lastModified not always set
     - set modified time (time of last successful fetch) by 
DefaultFetchSchedule and AdaptiveFetchSchedule
       but only if the document is actually modified
     - update unit tests to check whether modification time is properly set
     - set modified time (sent by responding server in HTTP header) in 
ProtocolOutput:
       FetchSchedule implementations can access the HTTP modified time from 
CrawlDatum's
       metadata (PROTO_STATUS_KEY = "_pst_")

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to