Sebastian Nagel created NUTCH-3080:
--------------------------------------
Summary: Injector and CrawlDbMerger to keep lockfile if CrawlDb
install failed
Key: NUTCH-3080
URL: https://issues.apache.org/jira/browse/NUTCH-3080
Project: Nutch
Issue Type: Bug
Components: crawldb, injector
Affects Versions: 1.20
Reporter: Sebastian Nagel
Fix For: 1.21
(see the discussion in NUTCH-3078)
Injector and CrawlDbMerger should keep the CrawlDb lockfile if the CrawlDb
installation fails which may lead to an incomplete CrawlDb, yet a data loss.
See for comparison
[CrawlDb.update(...)|https://github.com/apache/nutch/blob/4a61208f492613f2c5282741e64c036acabeb71e/src/java/org/apache/nutch/crawl/CrawlDb.java#L145]
or
[DeduplicationJob.run(...)|https://github.com/apache/nutch/blob/4a61208f492613f2c5282741e64c036acabeb71e/src/java/org/apache/nutch/crawl/DeduplicationJob.java].
In addition, there should be a clear message that the lockfile is kept because
the CrawlDb could be "damaged" which requires manual cleanup or a save-action.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)