Re: Cannot flush the log file to disk error - may need some code changes here

Matt Sanford Tue, 02 Aug 2005 11:52:58 -0700

Hello all,

    Issue Fixed, below is the cause info for anyone interested and for the mail archives. Thanks again for all of the info. First rule of code reuse: you screwed up, not them.

    I did a review of all code in my system calling interrupt() and found the problem in what I considered an unrelated place. Prior to my database transaction my code was checking something that required a timeout. If the thread timed out I sent it an interrupt. As it turned out, to support the check taking less than the total allowed time it could also interrupt it's parent to say it had completed. So, the short-short version looked something like this :

Thread-1 :

try {
    doCheckInNewThread(childThread); // See Thread-2
    Thread.sleep(timeout * 1000);
    // timed out.
    childThread.stopCheck(); // interrupts and sets a var.
    // throw timeout exception here
} catch (InterruptedException ie) {
    // ignore. the child completed in less than the timeout
}

Thread-2

// inside doCheckInNewThread
try {
    // long check here
} catch (InterruptedException ie) {
    // my parent has said it's time to come home for dinner.
} finally {
    // tell my parent i finished
    parentThread.interrupt();
}

    The time line to cause the problem was :

1. Thread-1 starts check
2. Thread-2 starts check that takes too long
3. Thread-1 detects timeout, sends interrupt
4. Thread-2 moves into catch block
5. Thread-1 passes it's catch block, moves on to database work
6. Thread-2 reaches finally, calls interrupt on parent
7. Thread-1 barfs when interrupted awaiting I/O operation
8. Matt weeps.

    This only happened under heavy load because it needed time-slicing to get in between the catch and finally in Thread-2. Tricky little sucker. Changed the finally in Thread-2 to check the quit variable (set in stopCheck) before calling interrupt.

Thanks;
-- Matt Sanford

[EMAIL PROTECTED] wrote:

As was pointed out your error is likely caused by something
sending an interrupt at an inopportune time to the derby thread.
These thread interrupts are a big problem for the database I/O
portion of derby, and so far I have not found an optimal way to
deal with these.  If any java experts have a good idea here I
would appreciate it.  In the "c" world I would just block
interrupts during the db I/O.



Getting an interrupt while attempting to sync the log file is
especially bad as there are only 2 options available to maintain
a recoverable database: retry or crash the database server.  In
key pieces of code we have tried to code retry logic, and this
has addressed most of the issues in the past.  It looks like
this one is a relatively new code path in the code and is JVM
specific to derby usage on jdk 1.4 and higher.

LogAccessFile.syncLogAccessFile attempts to retry, but it looks
like the exception being thrown is probably different depending
on the JVM.  We currently retry for SyncFailedException in
LogAccessFile, but in
this case it is getting a ClosedByInterruptException - see code
in org.apache.derby.impl.io.DirRandomAccessFile4.sync.

So the question is, should we retry for a ClosedByInterruptException?
My guess is that just doing another force call just gets the same
exception, so we may have reopen the whole channel.  I lean toward
that we should do whatever necessary to retry, as this error just
causes too many problems for applications.



Matt Sanford wrote:

Thanks for the speedy assistance. I added code to display the next
SQLException via getNextMessage() and got back :

java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)

at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:359)
at org.apache.derby.impl.io.DirRandomAccessFile4.sync(Unknown Source)
at
org.apache.derby.impl.store.raw.log.LogAccessFile.syncLogAccessFile(Unknown
Source)
at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source)
at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown Source)
at org.apache.derby.impl.store.raw.xact.Xact.commit(Unknown Source)
at org.apache.derby.impl.store.raw.xact.Xact.commit(Unknown Source)
at org.apache.derby.impl.store.access.RAMTransaction.commit(Unknown Source)
at
org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(Unknown
Source)
at
org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.userCommit(Unknown
Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.commit(Unknown
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.commit(Unknown Source)
...
...

Based on the info in the JDK javadocs it looks like I may have some
concurrent access somewhere. I will try and make a smallest-possible
test scenario and see if I can remove the problem.

Thanks again;
-- Matt Sanford


[EMAIL PROTECTED] wrote:

Hi Matt ,

I came across this error only when the disk is full or when there are
file system errors. Surprised to find that you don't see log flush SQL
Exception in the derby.log. May be the error log is getting
reinitialized when you reboot after a shutdown inside your program.
You can avoid reinitialization of error log by adding
derby.infolog.append=true to the derby.properties.

On the side note , if u are using Derby version before 10.1 , nested
exception were not getting logged into the derby.log file in some
cases. I think log can not flush log exception is also one of them.
This issue is fixed in 10.1 ( JIRA ENTRY : DERBY-237) .

If you can not change the derby version , In your program , you may
want to print the nested SQL Exceptions, real IO error might show up
if you do that.

calling SYSCS_UTIL.SYSCS_CHECKPOINT_DATABASE() can only clean up log
files , if the log files are not needed for recovery.
On rare cases I came across programs that starts a transaction in the
beginning and never commits them, when that happens log files can not
be reclaimed; because log files will be needed to rollback that
particular transaction. This can lead to disk full scenario. Your log
file number is small(log10) , so I doubt that is the case!.

You may also want to check the permissions for the files.


Thanks
-suresht

Matt Sanford wrote:

Hello,

I am using and embedded Derby database to replace a large SortedSet
in an RMI server program running on linux[1]. This program is
allowing both submission of new data and reading of data to manage a
queue of work. My program works fine at low load, but at high load I
am getting a SQLException with the message "Cannot flush the log file
to disk /usr/workSetDB/log/log10.dat" (filename varies). All access
to the Derby code is being synchronized to prevent problems with
transactional integrity. I followed the stack trace for the exception
(shown below[2]) but it is unclear to me what would be causing the
problem in the flush method. There are no disk or other errors in the
/var/log/messages file and I have attempted to move the logs dir off
to another device to see if that would help, but no luck.

To try and correct the problem I have also tried a spare thread
calling SYSCS_UTIL.SYSCS_CHECKPOINT_DATABASE() at an interval (1
second and 5 seconds) but that has also not helped the problem. Once
the problem occurs my program shuts down[3] the database and
reconnects using the same code it used during initial startup. On
doing the shutdown and reconnect my code calls setReadOnly(false),
which throws an exception with the message "A read-only user or a
user in a read-only database is not permitted to disable read-only
mode on a connection.". Stopping my program completely and restarting
under a new JVM seems to correct the problem.
I have turned up the logging with some derby.properties changes[4],
but the final SQL statement logged is not the insert that failed but
rather the select prior to it and the only error in the log file is
for the read-only problem, not the cause of the initial problem. Has
anyone seen problems with the transaction log flushing under heavy
write load ? I could find almost no reports of these errors when
searching.

Thanks;
-- Matt Sanford

[1] RedHat AS 2.1 with 4 CPUs and 4GB RAM. Using ext3 file systems.
Java Version : Sun j2sdk1.4.2_03, using options : -server -Xms1600M
-Xmx1600M -XX:+AggressiveHeap

[2] -- Stack Trace --
ERROR XSLA0: Cannot flush the log file to disk
/usr/workSetDB/log/log10.dat.
at org.apache.derby.iapi.error.StandardException.newException(Unknown
Source)
at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source)
at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown
Source)
at org.apache.derby.impl.store.raw.xact.Xact.commit(Unknown Source)
at org.apache.derby.impl.store.raw.xact.Xact.commit(Unknown Source)
at org.apache.derby.impl.store.access.RAMTransaction.commit(Unknown
Source)
at
org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(Unknown
Source)
at
org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.userCommit(Unknown
Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.commit(Unknown
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.commit(Unknown Source)
at
com.mycode.workset.collection.JDBCPendingWorkCollection.addPendingWork(JDBCPendingWorkCollection.java:182)

...
...


[3] Shutting down with :

try {
DriverManager.getConnection(myurl + ";shutdown=true");
} catch (SQLException e) {
// catch the "SQL Exception: Derby system shutdown." message.
log.info("Database shutdown complete.");
}

[4] derby.properties

derby.stream.error.file=/tmp/derby.log
derby.language.logStatementText=true
derby.stream.error.logSeverityLevel=0

Re: Cannot flush the log file to disk error - may need some code changes here

Reply via email to