[ https://issues.apache.org/jira/browse/NUTCH-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889463#comment-17889463 ]
Hiran Chaudhuri edited comment on NUTCH-3078 at 10/15/24 6:20 AM: ------------------------------------------------------------------ The required pattern is: {{Path lock = LockUtil.createLockFile(...);}} {{try {}} {{ // whatever needs to be done to the DB}} {{} finally {}} {{ LockUtil.removeLockFile(...);}} {{}}} An alternative might be to modify createLockFile() to not just create the file but also run [File.deleteOnExit()|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/File.html#deleteOnExit()], but that would imply that after one operation the JVM will terminate. Yet another pattern, maybe easier to remember would be [Try with Resources|[http://example.com|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]]. We introduce a wrapper LockFile to LockUtil that implements AutoCloseable, then applying that would look like {{try (LockFile lockfile = new LockFile(...)) {}} { {{ // whatever needs to be done to the DB}} } was (Author: hiranchaudhuri): The required pattern is: {{Path lock = LockUtil.createLockFile(...);}} {{try {}} {{ // whatever needs to be done to the DB}} {{} finally {}} {{ LockUtil.removeLockFile(...);}} {{}}} An alternative might be to modify createLockFile() to not just create the file but also run [File.deleteOnExit()|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/File.html#deleteOnExit()], but that would imply that after one operation the JVM will terminate. Yet another pattern, maybe easier to remember would be [Try with Resources|[http://example.com|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]]. We introduce a wrapper LockFile to LockUtil that implements AutoCloseable, then applying that would look like {{try (LockFile lockfile = new LockFile(...)) {}} {{{{ // whatever needs to be done to the DB}}}} {{{{{}{}}}}}} > Database is not unlocked when injector fails > -------------------------------------------- > > Key: NUTCH-3078 > URL: https://issues.apache.org/jira/browse/NUTCH-3078 > Project: Nutch > Issue Type: Bug > Components: injector > Affects Versions: 1.21 > Environment: Ubuntu 22 LTS > $JAVA_HOME/bin/java -version > openjdk version "21.0.4" 2024-07-16 LTS > OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS) > OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode, > sharing) > Reporter: Hiran Chaudhuri > Priority: Major > > The injector locks the database but in case of failure does not unlock it. > This is a problem on the next invocation. To repeat this, start off with a > non-existing crawldb and non-existing seed directory: > {{./local/bin/nutch inject crawl/crawldb urls}} > The crawldb is created and locked, but then the injector fails with > {{2024-10-14 07:43:20,091 ERROR org.apache.nutch.crawl.Injector [main] > Injector: java.io.FileNotFoundException: File urls does not exist}} > {{ at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:733)}} > {{ at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2078)}} > {{ at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2122)}} > {{ at > org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:970)}} > {{ at org.apache.nutch.crawl.Injector.inject(Injector.java:418)}} > {{ at org.apache.nutch.crawl.Injector.run(Injector.java:574)}} > {{ at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)}} > {{ at org.apache.nutch.crawl.Injector.main(Injector.java:538)}} > Well, the urls directory indeed does not exist. So let's run the same job > with the correct directory: > {{./local/bin/nutch inject crawl/crawldb ../urls}} > And despite we have the right directory, the Injector fails with > {{2024-10-14 07:43:30,147 ERROR org.apache.nutch.crawl.Injector [main] > Injector: java.io.IOException: lock file crawl/crawldb/.locked already > exists.}} > {{ at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:50)}} > {{ at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:80)}} > {{ at org.apache.nutch.crawl.CrawlDb.lock(CrawlDb.java:193)}} > {{ at org.apache.nutch.crawl.Injector.inject(Injector.java:404)}} > {{ at org.apache.nutch.crawl.Injector.run(Injector.java:574)}} > {{ at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)}} > {{ at org.apache.nutch.crawl.Injector.main(Injector.java:538)}} > I'd expect when Injector finishes (successful or not) the lock on the DB is > removed again. -- This message was sent by Atlassian Jira (v8.20.10#820010)