Sebastian Nagel created NUTCH-3080: -------------------------------------- Summary: Injector and CrawlDbMerger to keep lockfile if CrawlDb install failed Key: NUTCH-3080 URL: https://issues.apache.org/jira/browse/NUTCH-3080 Project: Nutch Issue Type: Bug Components: crawldb, injector Affects Versions: 1.20 Reporter: Sebastian Nagel Fix For: 1.21
(see the discussion in NUTCH-3078) Injector and CrawlDbMerger should keep the CrawlDb lockfile if the CrawlDb installation fails which may lead to an incomplete CrawlDb, yet a data loss. See for comparison [CrawlDb.update(...)|https://github.com/apache/nutch/blob/4a61208f492613f2c5282741e64c036acabeb71e/src/java/org/apache/nutch/crawl/CrawlDb.java#L145] or [DeduplicationJob.run(...)|https://github.com/apache/nutch/blob/4a61208f492613f2c5282741e64c036acabeb71e/src/java/org/apache/nutch/crawl/DeduplicationJob.java]. In addition, there should be a clear message that the lockfile is kept because the CrawlDb could be "damaged" which requires manual cleanup or a save-action. -- This message was sent by Atlassian Jira (v8.20.10#820010)