Hiran Chaudhuri created NUTCH-3078: -------------------------------------- Summary: Database is not unlocked when injector fails Key: NUTCH-3078 URL: https://issues.apache.org/jira/browse/NUTCH-3078 Project: Nutch Issue Type: Bug Components: injector Affects Versions: 1.21 Environment: Ubuntu 22 LTS
$JAVA_HOME/bin/java -version openjdk version "21.0.4" 2024-07-16 LTS OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode, sharing) Reporter: Hiran Chaudhuri The injector locks the database but in case of failure does not unlock it. This is a problem on the next invocation. To repeat this, start off with a non-existing crawldb and non-existing seed directory: {{./local/bin/nutch inject crawl/crawldb urls}} The crawldb is created and locked, but then the injector fails with {{2024-10-14 07:43:20,091 ERROR org.apache.nutch.crawl.Injector [main] Injector: java.io.FileNotFoundException: File urls does not exist}} {{ at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:733)}} {{ at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2078)}} {{ at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2122)}} {{ at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:970)}} {{ at org.apache.nutch.crawl.Injector.inject(Injector.java:418)}} {{ at org.apache.nutch.crawl.Injector.run(Injector.java:574)}} {{ at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)}} {{ at org.apache.nutch.crawl.Injector.main(Injector.java:538)}} Well, the urls directory indeed does not exist. So let's run the same job with the correct directory: {{./local/bin/nutch inject crawl/crawldb ../urls}} And despite we have the right directory, the Injector fails with {{2024-10-14 07:43:30,147 ERROR org.apache.nutch.crawl.Injector [main] Injector: java.io.IOException: lock file crawl/crawldb/.locked already exists.}} {{ at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:50)}} {{ at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:80)}} {{ at org.apache.nutch.crawl.CrawlDb.lock(CrawlDb.java:193)}} {{ at org.apache.nutch.crawl.Injector.inject(Injector.java:404)}} {{ at org.apache.nutch.crawl.Injector.run(Injector.java:574)}} {{ at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)}} {{ at org.apache.nutch.crawl.Injector.main(Injector.java:538)}} I'd expect when Injector finishes (successful or not) the lock on the DB is removed again. -- This message was sent by Atlassian Jira (v8.20.10#820010)