[ 
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239106#comment-16239106
 ] 

ASF GitHub Bot commented on NUTCH-2242:
---------------------------------------

Omkar20895 opened a new pull request #238: NUTCH-2242 Injector to stop if job 
fails to avoid loss of CrawlDb
URL: https://github.com/apache/nutch/pull/238
 
 
   - Added Job status checks in the classes: Injector, ReadHostDb, 
CrawlCompletionStats, ProtocolStatusStatistics, SitemapProcessor and 
DomainStatistics. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> lastModified not always set
> ---------------------------
>
>                 Key: NUTCH-2242
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2242
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.11
>            Reporter: Jurian Broertjes
>            Priority: Minor
>             Fix For: 1.13
>
>         Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not 
> updated on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the 
> modifiedTime isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to