|
Hello all, Issue Fixed, below is the cause info for anyone interested and for the mail archives. Thanks again for all of the info. First rule of code reuse: you screwed up, not them. I did a review of all code in my system calling interrupt() and found the problem in what I considered an unrelated place. Prior to my database transaction my code was checking something that required a timeout. If the thread timed out I sent it an interrupt. As it turned out, to support the check taking less than the total allowed time it could also interrupt it's parent to say it had completed. So, the short-short version looked something like this : Thread-1 : try { doCheckInNewThread(childThread); // See Thread-2 Thread.sleep(timeout * 1000); // timed out. childThread.stopCheck(); // interrupts and sets a var. // throw timeout exception here } catch (InterruptedException ie) { // ignore. the child completed in less than the timeout } Thread-2 // inside doCheckInNewThread try { // long check here } catch (InterruptedException ie) { // my parent has said it's time to come home for dinner. } finally { // tell my parent i finished parentThread.interrupt(); } The time line to cause the problem was : 1. Thread-1 starts check 2. Thread-2 starts check that takes too long 3. Thread-1 detects timeout, sends interrupt 4. Thread-2 moves into catch block 5. Thread-1 passes it's catch block, moves on to database work 6. Thread-2 reaches finally, calls interrupt on parent 7. Thread-1 barfs when interrupted awaiting I/O operation 8. Matt weeps. This only happened under heavy load because it needed time-slicing to get in between the catch and finally in Thread-2. Tricky little sucker. Changed the finally in Thread-2 to check the quit variable (set in stopCheck) before calling interrupt. Thanks; -- Matt Sanford [EMAIL PROTECTED] wrote: As was pointed out your error is likely caused by something sending an interrupt at an inopportune time to the derby thread. These thread interrupts are a big problem for the database I/O portion of derby, and so far I have not found an optimal way to deal with these. If any java experts have a good idea here I would appreciate it. In the "c" world I would just block interrupts during the db I/O.Getting an interrupt while attempting to sync the log file is especially bad as there are only 2 options available to maintain a recoverable database: retry or crash the database server. In key pieces of code we have tried to code retry logic, and this has addressed most of the issues in the past. It looks like this one is a relatively new code path in the code and is JVM specific to derby usage on jdk 1.4 and higher. LogAccessFile.syncLogAccessFile attempts to retry, but it looks like the exception being thrown is probably different depending on the JVM. We currently retry for SyncFailedException in LogAccessFile, but in this case it is getting a ClosedByInterruptException - see code in org.apache.derby.impl.io.DirRandomAccessFile4.sync. So the question is, should we retry for a ClosedByInterruptException? My guess is that just doing another force call just gets the same exception, so we may have reopen the whole channel. I lean toward that we should do whatever necessary to retry, as this error just causes too many problems for applications. Matt Sanford wrote:Thanks for the speedy assistance. I added code to display the next SQLException via getNextMessage() and got back : java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:359) at org.apache.derby.impl.io.DirRandomAccessFile4.sync(Unknown Source) at org.apache.derby.impl.store.raw.log.LogAccessFile.syncLogAccessFile(Unknown Source) at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source) at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source) at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.commit(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.commit(Unknown Source) at org.apache.derby.impl.store.access.RAMTransaction.commit(Unknown Source) at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(Unknown Source) at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.userCommit(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.commit(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.commit(Unknown Source) ... ... Based on the info in the JDK javadocs it looks like I may have some concurrent access somewhere. I will try and make a smallest-possible test scenario and see if I can remove the problem. Thanks again; -- Matt Sanford [EMAIL PROTECTED] wrote: |
- Re: Cannot flush the log file to disk error - may need some... Matt Sanford
- Re: Cannot flush the log file to disk error - may need... Mike Matrigali
