[
https://issues.apache.org/jira/browse/DERBY-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Hillegas updated DERBY-2254:
---------------------------------
Attachment: derby-2254-01-ab-accountForMoreOverhead.diff
Attaching derby-2254-01-ab-accountForMoreOverhead.diff, a candidate fix for
this bug. After applying this patch, I can't reproduce the bug using the
attached repro. I have run the repro for more than 1500 iterations.
This is my understanding of what is going on here:
A) The original assertion is supposed to check that you are not creating an
illegal log instant. A log instant consists of a log file number and an offset
into that file. A log instant is illegal if the file number or offset are too
big. In this particular case, the assertion fails because the offset is too big.
B) The offset is the actual end of the current log file. That is, the log file
has already spilled past its legal end.
C) The assertion is triggered after the log writer notices that the log file
needs to be switched. The log writer ought to detect this condition and switch
the log file BEFORE writing a log record which spills over the end.
Unfortunately, the log writer only notices this condition AFTER a log record
has been written which straddles the legal end of file.
D) We check for whether we should switch by asking whether the remaining bytes
we need to write will spill past the legal end of the file. To compute the
number of bytes we expect to write, we add the length of the log wrapper
overhead to the length of the current log record. However, it turns out that
may need to write more bytes than that. The calculation does not include an
extra four bytes needed to write a terminating 0 to the log. The calculation
also does not include the bytes needed to write a checksum record.
E) If we are already very close to the legal end of the log file, then we could
spill past the end after writing the wrapper, the log record, the terminating
0, and the checksum record. I suppose that we get here if the checkpointing
thread can't get in often enough to switch log files before we encroach on the
end of the current file.
F) The fix is to require that the log accomodate that extra overhead (the
terminating 0 and the checksum record). Now we force a log switch if there is
not enough room to write the current log record, its wrapper, the terminating
0, and the checksum record.
I have made 3 changes:
1) The actual change to LogToFile to increase the size of the overhead space we
require to avoid a log switch.
2) All of the places in LogToFile which update the remembered endPostion have
been reworked to call a brief method which sanity checks whether we have
spilled past the end of the file already. I put in this instrumentation in
order to find where the extra bytes were being written. I think it is a useful
sanity check but I can remove it if people think that it will slow down log
processing significantly.
3) I added a little more information to the original assertion and its friends
in LogCounter.
Touches the following files:
----------
M java/engine/org/apache/derby/impl/store/raw/log/LogAccessFile.java
Exposed the length of the checksum record so that LogToFile can accurately
calculate its overhead.
----------
M java/engine/org/apache/derby/impl/store/raw/log/LogCounter.java
Items (1) and (2) as described above.
----------
M java/engine/org/apache/derby/impl/store/raw/log/LogToFile.java
Item (3) above.
> Assert during log file switch: log file position exceeded max log file size
> ---------------------------------------------------------------------------
>
> Key: DERBY-2254
> URL: https://issues.apache.org/jira/browse/DERBY-2254
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.3.1.4
> Environment: Solaris 10, Java SE 6 build 104
> Reporter: Olav Sandstaa
> Assignee: Rick Hillegas
> Labels: derby_triage10_5_2
> Attachments: d2254.zip, derby-2254-01-ab-accountForMoreOverhead.diff
>
>
> When running simple tpc-b like transactions against a embedded Derby based on
> a SANE build of trunk the following assertion occurs for the background
> thread and all user threads:
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log
> file position exceeded max log file size
> This seems to occur during a switch to a new log file.
> derby.log contains the following call stack for the background thread:
> Exception trace:
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log file
> position exceeded max log file size
> at
> org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:120)
> at
> org.apache.derby.impl.store.raw.log.LogCounter.makeLogInstantAsLong(LogCounter.java:120)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.switchLogFile(LogToFile.java:1900)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3530)
> at
> org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:345)
> at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1185)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(LogToFile.java:1540)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(LogToFile.java:1357)
> at
> org.apache.derby.impl.store.raw.RawStore.checkpoint(RawStore.java:439)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.performWork(LogToFile.java:3416)
> at
> org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(BasicDaemon.java:331)
> at
> org.apache.derby.impl.services.daemon.BasicDaemon.work(BasicDaemon.java:668)
> at
> org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:394)
> at java.lang.Thread.run(Thread.java:619)
> 2007-01-17 23:09:48.638 GMT Thread[derby.rawStoreDaemon,5,derby.daemons]
> Cleanup action starting
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log file
> position exceeded max log file size
> at
> org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:120)
> at
> org.apache.derby.impl.store.raw.log.LogCounter.makeLogInstantAsLong(LogCounter.java:120)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.switchLogFile(LogToFile.java:1900)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3530)
> at
> org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:345)
> at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1185)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(LogToFile.java:1540)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(LogToFile.java:1357)
> at
> org.apache.derby.impl.store.raw.RawStore.checkpoint(RawStore.java:439)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.performWork(LogToFile.java:3416)
> at
> org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(BasicDaemon.java:331)
> at
> org.apache.derby.impl.services.daemon.BasicDaemon.work(BasicDaemon.java:668)
> at
> org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:394)
> at java.lang.Thread.run(Thread.java:619)
> Cleanup action completed
> For my user threads the call stack is similar:
> Database Class Loader started - derby.database.classpath=''
> 2007-01-17 23:09:36.401 GMT Thread[Thread-51,5,main] (XID = 12632406),
> (SESSIONID = 51), (DATABASE = /export/home/tmp/derby-db), (DRDAID = null),
> Cleanup action starting
> 2007-01-17 23:09:36.401 GMT Thread[Thread-51,5,main] (XID = 12632406),
> (SESSIONID = 51), (DATABASE = /export/home/tmp/derby-db), (DRDAID = null),
> Failed Statement is: UPDATE accounts SET abal = abal + ? WHERE aid = ? AND
> bid = ?
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log file
> position exceeded max log file size
> at
> org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:120)
> at
> org.apache.derby.impl.store.raw.log.LogCounter.makeLogInstantAsLong(LogCounter.java:120)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.switchLogFile(LogToFile.java:1900)
> at
> org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3530)
> at
> org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:345)
> at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1185)
> at
> org.apache.derby.impl.store.raw.data.LoggableActions.doAction(LoggableActions.java:221)
> at
> org.apache.derby.impl.store.raw.data.LoggableActions.actionUpdate(LoggableActions.java:85)
> at
> org.apache.derby.impl.store.raw.data.StoredPage.doUpdateAtSlot(StoredPage.java:8521)
> at
> org.apache.derby.impl.store.raw.data.BasePage.updateAtSlot(BasePage.java:1108)
> at
> org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.replace(GenericConglomerateController.java:479)
> at
> org.apache.derby.impl.sql.execute.RowChangerImpl.updateRow(RowChangerImpl.java:523)
> at
> org.apache.derby.impl.sql.execute.UpdateResultSet.collectAffectedRows(UpdateResultSet.java:566)
> at
> org.apache.derby.impl.sql.execute.UpdateResultSet.open(UpdateResultSet.java:260)
> at
> org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:358)
> at
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1182)
> at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1652)
> at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:299)
> at
> com.sun.derby.perf.clients.tpcb.DBConnection.performTransaction(DBConnection.java:595)
> at com.sun.derby.perf.clients.tpcb.Client.run(Client.java:218)
> After this it seems like no user threads are able to connect to the database
> (not surpricing).
> This happend using a SANE build and I am unsure by just looking at the
> assertion in the code whether this situation would have been fatal also if I
> had been using an INSANE build.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira