[jira] [Comment Edited] (NUTCH-3078) Database is not unlocked when injector fails

Hiran Chaudhuri (Jira) Mon, 14 Oct 2024 23:21:34 -0700


    [ 
https://issues.apache.org/jira/browse/NUTCH-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889463#comment-17889463
 ]


Hiran Chaudhuri edited comment on NUTCH-3078 at 10/15/24 6:20 AM:
------------------------------------------------------------------

The required pattern is:

{{Path lock = LockUtil.createLockFile(...);}}
{{try {}}
{{    // whatever needs to be done to the DB}}
{{} finally {}}
{{    LockUtil.removeLockFile(...);}}
{{}}}

An alternative might be to modify createLockFile() to not just create the file 
but also run 
[File.deleteOnExit()|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/File.html#deleteOnExit()],
 but that would imply that after one operation the JVM will terminate.

Yet another pattern, maybe easier to remember would be [Try with 
Resources|[http://example.com|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]].
 We introduce a wrapper LockFile to LockUtil that implements AutoCloseable, 
then applying that would look like

{{try (LockFile lockfile = new LockFile(...)) {}}
{
{{    // whatever needs to be done to the DB}}
}



 


was (Author: hiranchaudhuri):
The required pattern is:

{{Path lock = LockUtil.createLockFile(...);}}
{{try {}}
{{    // whatever needs to be done to the DB}}
{{} finally {}}
{{    LockUtil.removeLockFile(...);}}
{{}}}

An alternative might be to modify createLockFile() to not just create the file 
but also run 
[File.deleteOnExit()|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/File.html#deleteOnExit()],
 but that would imply that after one operation the JVM will terminate.

Yet another pattern, maybe easier to remember would be [Try with 
Resources|[http://example.com|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]].
 We introduce a wrapper LockFile to LockUtil that implements AutoCloseable, 
then applying that would look like

{{try (LockFile lockfile = new LockFile(...)) {}}
{{{{    // whatever needs to be done to the DB}}}}
{{{{{}{}}}}}}

 

> Database is not unlocked when injector fails
> --------------------------------------------
>
>                 Key: NUTCH-3078
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3078
>             Project: Nutch
>          Issue Type: Bug
>          Components: injector
>    Affects Versions: 1.21
>         Environment: Ubuntu 22 LTS
> $JAVA_HOME/bin/java -version
> openjdk version "21.0.4" 2024-07-16 LTS
> OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS)
> OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode, 
> sharing)
>            Reporter: Hiran Chaudhuri
>            Priority: Major
>
> The injector locks the database but in case of failure does not unlock it. 
> This is a problem on the next invocation. To repeat this, start off with a 
> non-existing crawldb and non-existing seed directory:
> {{./local/bin/nutch inject crawl/crawldb urls}}
> The crawldb is created and locked, but then the injector fails with
> {{2024-10-14 07:43:20,091 ERROR org.apache.nutch.crawl.Injector [main] 
> Injector: java.io.FileNotFoundException: File urls does not exist}}
> {{    at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:733)}}
> {{    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2078)}}
> {{    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2122)}}
> {{    at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:970)}}
> {{    at org.apache.nutch.crawl.Injector.inject(Injector.java:418)}}
> {{    at org.apache.nutch.crawl.Injector.run(Injector.java:574)}}
> {{    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)}}
> {{    at org.apache.nutch.crawl.Injector.main(Injector.java:538)}}
> Well, the urls directory indeed does not exist. So let's run the same job 
> with the correct directory:
> {{./local/bin/nutch inject crawl/crawldb ../urls}}
> And despite we have the right directory, the Injector fails with
> {{2024-10-14 07:43:30,147 ERROR org.apache.nutch.crawl.Injector [main] 
> Injector: java.io.IOException: lock file crawl/crawldb/.locked already 
> exists.}}
> {{    at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:50)}}
> {{    at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:80)}}
> {{    at org.apache.nutch.crawl.CrawlDb.lock(CrawlDb.java:193)}}
> {{    at org.apache.nutch.crawl.Injector.inject(Injector.java:404)}}
> {{    at org.apache.nutch.crawl.Injector.run(Injector.java:574)}}
> {{    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)}}
> {{    at org.apache.nutch.crawl.Injector.main(Injector.java:538)}}
> I'd expect when Injector finishes (successful or not) the lock on the DB is 
> removed again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (NUTCH-3078) Database is not unlocked when injector fails

Reply via email to