Sebastian Nagel created NUTCH-3080:
--------------------------------------

             Summary: Injector and CrawlDbMerger to keep lockfile if CrawlDb 
install failed
                 Key: NUTCH-3080
                 URL: https://issues.apache.org/jira/browse/NUTCH-3080
             Project: Nutch
          Issue Type: Bug
          Components: crawldb, injector
    Affects Versions: 1.20
            Reporter: Sebastian Nagel
             Fix For: 1.21


(see the discussion in NUTCH-3078)

Injector and CrawlDbMerger should keep the CrawlDb lockfile if the CrawlDb 
installation fails which may lead to an incomplete CrawlDb, yet a data loss. 
See for comparison 
[CrawlDb.update(...)|https://github.com/apache/nutch/blob/4a61208f492613f2c5282741e64c036acabeb71e/src/java/org/apache/nutch/crawl/CrawlDb.java#L145]
 or 
[DeduplicationJob.run(...)|https://github.com/apache/nutch/blob/4a61208f492613f2c5282741e64c036acabeb71e/src/java/org/apache/nutch/crawl/DeduplicationJob.java].

In addition, there should be a clear message that the lockfile is kept because 
the CrawlDb could be "damaged" which requires manual cleanup or a save-action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to