[ 
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279942#comment-15279942
 ] 

Sebastian Nagel commented on NUTCH-2242:
----------------------------------------

[~markus17]: Sorry, I didn't upload a final patch, simply because the solution 
on github (see 
[diff|https://github.com/apache/nutch/compare/master...sebastian-nagel:NUTCH-2164])
 was not finally tested. I'll prepare a final patch / pull request.
[~jurian]: Setting the modified time in CrawlDb is done by 
AdaptiveFetchSchedule and (now) by DefaultFetchSchedule. It does not really 
make sense to do this twice. Also, (if done at this place) it would overwrite 
the modified time, e.g., detected by a signature comparison.

> lastModified not always set
> ---------------------------
>
>                 Key: NUTCH-2242
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2242
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.11
>            Reporter: Jurian Broertjes
>            Priority: Minor
>             Fix For: 1.12
>
>         Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not 
> updated on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the 
> modifiedTime isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to