[ 
https://issues.apache.org/jira/browse/NUTCH-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419148#comment-13419148
 ] 

Markus Jelsma commented on NUTCH-1341:
--------------------------------------

Thanks. This has been running in production for quite some time now and has 
kept the modifiedTime stable over successive fetches. We use it togeter with 
the dateExtractorParseFilter, now the times don't shift if we force a complete 
reindex (we usually don't index notModified pages).
                
> NotModified time set to now but page not modified
> -------------------------------------------------
>
>                 Key: NUTCH-1341
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1341
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.6
>
>         Attachments: NUTCH-1341-1.6-1.patch
>
>
> Servers tend to respond with incorrect or no value for LastModified. By 
> comparing signatures or when (fetch.getStatus() == 
> CrawlDatum.STATUS_FETCH_NOTMODIFIED) the reducer correctly sets the 
> db_notmodified status for the CrawlDatum. The modifiedTime value, however, is 
> not set accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to