[
https://issues.apache.org/jira/browse/NUTCH-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419148#comment-13419148
]
Markus Jelsma commented on NUTCH-1341:
--------------------------------------
Thanks. This has been running in production for quite some time now and has
kept the modifiedTime stable over successive fetches. We use it togeter with
the dateExtractorParseFilter, now the times don't shift if we force a complete
reindex (we usually don't index notModified pages).
> NotModified time set to now but page not modified
> -------------------------------------------------
>
> Key: NUTCH-1341
> URL: https://issues.apache.org/jira/browse/NUTCH-1341
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.5
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.6
>
> Attachments: NUTCH-1341-1.6-1.patch
>
>
> Servers tend to respond with incorrect or no value for LastModified. By
> comparing signatures or when (fetch.getStatus() ==
> CrawlDatum.STATUS_FETCH_NOTMODIFIED) the reducer correctly sets the
> db_notmodified status for the CrawlDatum. The modifiedTime value, however, is
> not set accordingly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira