Certainly very flaky. I did make a mental note of this when I was reorganising the properties for the release candidate. We should fix this one ASAP now that we see it doesn't work on some VM's.
Fred Toussi ----- Original Message ----- From: "Campbell Boucher-Burnet" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: 07 March 2002 10:51 Subject: Re: [Hsqldb-developers] The database is already in use by another process Sounds like a problem in Log.isAlreadyOpen...note the "// todo: check if this works in all operating systems:" comment. Anyone want to contribute ideas/comments about what is wrong/could be better? Here's my go at it: A quick grep of the sources shows that the *only* place the string: ".lock" occurs is in the isAlreadyOpen() method of Log (code reproduced below). That is, I suspect it never gets created by hsql(db) code. Sleuthing a bit, the JavaDoc for lastModified() states: ------------------------- lastModified public long lastModified() Returns the time that the file denoted by this abstract pathname was last modified. Returns: A long value representing the time the file was last modified, measured in milliseconds since the epoch (00:00:00 GMT, January 1, 1970), or 0L if the file does not exist or if an I/O error occurs ------------------------- If my conclusion that hsqldb never creates the "database name".lock file is corrent, then the last modified code is, as far as I can tell, useless. It seems to do nothing more than waste 3 seconds every time the method is called, since "if (l1 != l2)" can never be satisfied (it will will always be equivalent to "if(0L != 0L)."). Is the f.lastModified code simply a remant that represents something that never got completed in hsql1.4 and that nobody has ever looked at since? I suspect so. Anyway, looking at the rest of the code, it would seem that the test basically closes the FileInputStream on the .properties file and tries to delete it, stating that the database is open by another process if it cannot delete the file after closing it's own Stream on it. My gut says this sounds really flaky. Lets go hunting some more... Once again, from the JavaDocs: ---------------------- public boolean delete() Deletes the file or directory denoted by this abstract pathname. If this pathname denotes a directory, then the directory must be empty in order to be deleted. Returns: true if and only if the file or directory is successfully deleted; false otherwise --------------------- >From my reading the this definition of delete, I understand that false can come back or an exception if: - the file does not exist - the file is locked by another process (if that is supported on the OS in question), - the OS encounters an unspecified problem while attempting to delete the file - etc. Right up front, it is obvious that a race condition can occur here, since two instances can startup and try to delete the .properties file, one succeeding, and the other one failing, since it tries just a moment after the first deletes it successfully. The order is not garanteed (and java execution times vary wildly, anyway), so the behaviour is completly non-deterministic, w.r.t. who was actually invoked first. It is conceivable that super-servers taking down and starting up hsqldb instances in rapid succession against the same database inside the same VM could encounter unpredictable results, since Log.close and Log.open are called regularly from other operations, such as shutdown and checkpoint and indirectly during class finalization (in Database, for instance) Also, it is unclear to me precisely how long it takes on every platform to release a lock on a file, once a stream on it has been closed. I presume that XXXStream.close should not return before the OS has actually garanteed that the file is released completely. But perhaps there is another strange timing issue here that occasionally results in the delete request being processed before the close has actually resulted in the OS fully releasing any native locks on the file. Could this be another race condition? For sure, I have experienced less than perfect file locking timing issues under Win32 before under FAT/FAT32 (as opposed to NTFS, which seems to be pretty stable in this respect). I think the way to resolve this might be: 1.) actually create the .lock file if it does not exist (DUH!), and call deleteOnExit() against it after creating it. This garantees that: - we will fail to create it if it already exists (must be that another instance has created it and is still running or the hosting VM has crashed after allowing another instance to mount the database, but before normal shutdown of the instance, which should delete the lock file as the last operation of a normal database shutdown) - if we do create it, it will be deleted on normal termination of the VM, even if the database engine reaches a non-functioning state before that time. As stated in the JavaDocs: ---------------- createNewFile public boolean createNewFile() throws IOException Atomically creates a new, empty file named by this abstract pathname if and only if a file with this name does not yet exist. The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file. This method, in combination with the deleteOnExit() method, can therefore serve as the basis for a simple but reliable cooperative file-locking protocol. --------------- 2.) If the lock file exists, then either another instance really does have the database open (remember, createNewFile is apparently garanteed to be atomic), or the VM hosting the last instance that mounted the database crashed (or some fool user created the file, external to the engine...sorry, there is no way to account for user error). In either case, a simple solution is to refuse to start up until the file is deleted (the other process shuts down normally, the VM is terminated normally, or an admin realizes that the last instance's VM crashed and manually deletes the lock) 3.) Anybody have a good algorithm for a better cooperative file locking protocol? Refusing to start up if the lock file exists defeats the automatic recovery feature of hsql(db) somewhat, as it still requires a small amount of manual intervention that may not be available if controlling things remotely or in an automated fashion when controlling an embedded hsqldb instance from a super-server. So, is it garateed on all operating systems that an attempt to delete a file open for write will fail? If so, we can discriminate between a previous crash and a currently mounted database by attepting to delete the lock file That is, perhaps if an instance keeps a write stream open on the lock file unil after normal shutdown, we could use File.delete or File.canWrite() to do the test. If we have already garanteed that the lock file exists, is it garanteed on all platforms the File.delete or File.canWrite() will return false if the file is already open in write mode elsewhere (e.g. in another process or by a XXXFileStream in the same VM)? Finally, here is the (probable) offender: ------------------------- private boolean isAlreadyOpen() throws SQLException { // reading the last modified, wait 3 seconds, read again. // if the same information was read the file was not changed // and is probably, except the other process is blocked if (Trace.TRACE) { Trace.trace(); } File f = new File(sName + ".lock"); long l1 = f.lastModified(); try { Thread.sleep(3000); } catch (Exception e) {} long l2 = f.lastModified(); if (l1 != l2) { return true; } // check by trying to delete the properties file // this will not work if some application has the file open // this is why the properties file is kept open when running ;-) // todo: check if this works in all operating systems closeProperties(); if (Trace.TRACE) { Trace.trace(); } if ((new File(sFileProperties)).delete() == false) { return true; } // the file was deleted, so recreate it now saveProperties(); return false; } -------------------------- Hope that helps a bit, Campbell ----- Original Message ----- From: "Kevin A. Burton" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, March 07, 2002 2:18 AM Subject: [Hsqldb-developers] The database is already in use by another process > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > Hm... > > Has anyone run into this problem on Windows? It seems that all of my windows > users has this problem (JDK 1.3.1, 1.4) but none of my Linux or MacOSX friends > seem to have this. > > I looked into the code... it seems that this is either a VM/filesystem bug or > something is REALLY changing the file. > > Considering it only happens under Windows I suspect the VM/filesystem bug. > > java.sql.SQLException: The database is already in use by another process > at org.hsqldb.Trace.getError(Trace.java:180) > at org.hsqldb.Trace.getError(Trace.java:144) > at org.hsqldb.Trace.error(Trace.java:192) > at org.hsqldb.Log.open(Log.java:208) > at org.hsqldb.Database.<init>(Database.java:96) > at org.hsqldb.jdbcConnection.openStandalone(jdbcConnection.java:926) > at org.hsqldb.jdbcConnection.<init>(jdbcConnection.java:682) > at org.hsqldb.jdbcDriver.connect(jdbcDriver.java:116) > at java.sql.DriverManager.getConnection(DriverManager.java:512) > at java.sql.DriverManager.getConnection(DriverManager.java:171) > at org.apache.turbine.util.db.adapter.DBHypersonicSQL.getConnection(DBHypersoni cSQL.java:94) > at org.apache.turbine.util.db.pool.ConnectionPool.getNewConnection(ConnectionPo ol.java:498) > at org.apache.turbine.util.db.pool.ConnectionPool.getConnection(ConnectionPool. java:339) > at org.apache.turbine.services.db.TurbinePoolBrokerService.getConnection(Turbin ePoolBrokerService.java:171) > at org.apache.turbine.services.db.TurbineDB.getConnection(TurbineDB.java:194) > at org.apache.turbine.om.peer.BasePeer.executeQuery(BasePeer.java:1249) > at org.apache.turbine.om.peer.BasePeer.executeQuery(BasePeer.java:1202) > > Kevin > > - -- > Kevin A. Burton ( [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] ) > Location - San Francisco, CA, Cell - 415.595.9965 > Jabber - [EMAIL PROTECTED], Web - http://relativity.yi.org/ > > We can plant a house, we can build a tree > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.0.6 (GNU/Linux) > Comment: Get my public key at: http://relativity.yi.org/pgpkey.txt > > iD8DBQE8hyJTAwM6xb2dfE0RAoIsAKCK7t7qjO4yTJjtlXrk0dbqsUaPSwCfX/AB > 1JIgLTFCbo5Xzmaqw5o5vWQ= > =Jppw > -----END PGP SIGNATURE----- > > _______________________________________________ > hsqldb-developers mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/hsqldb-developers > _______________________________________________ hsqldb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/hsqldb-developers _______________________________________________ hsqldb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/hsqldb-developers