[ 
https://issues.apache.org/jira/browse/DERBY-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Hillegas updated DERBY-2254:
---------------------------------

    Attachment: derby-2254-01-ab-accountForMoreOverhead.diff

Attaching derby-2254-01-ab-accountForMoreOverhead.diff, a candidate fix for 
this bug. After applying this patch, I can't reproduce the bug using the 
attached repro. I have run the repro for more than 1500 iterations.

This is my understanding of what is going on here:

A) The original assertion is supposed to check that you are not creating an 
illegal log instant. A log instant consists of a log file number and an offset 
into that file. A log instant is illegal if the file number or offset are too 
big. In this particular case, the assertion fails because the offset is too big.

B) The offset is the actual end of the current log file. That is, the log file 
has already spilled past its legal end.

C) The assertion is triggered after the log writer notices that the log file 
needs to be switched. The log writer ought to detect this condition and switch 
the log file BEFORE writing a log record which spills over the end. 
Unfortunately, the log writer only notices this condition AFTER a log record 
has been written which straddles the legal end of file.

D) We check for whether we should switch by asking whether the remaining bytes 
we need to write will spill past the legal end of the file. To compute the 
number of bytes we expect to write, we add the length of the log wrapper 
overhead to the length of the current log record. However, it turns out that 
may need to write more bytes than that. The calculation does not include an 
extra four bytes needed to write a terminating 0 to the log. The calculation 
also does not include the bytes needed to write a checksum record.

E) If we are already very close to the legal end of the log file, then we could 
spill past the end after writing the wrapper, the log record, the terminating 
0, and the checksum record. I suppose that we get here if the checkpointing 
thread can't get in often enough to switch log files before we encroach on the 
end of the current file.

F) The fix is to require that the log accomodate that extra overhead (the 
terminating 0 and the checksum record). Now we force a log switch if there is 
not enough room to write the current log record, its wrapper, the terminating 
0, and the checksum record.


I have made 3 changes:

1) The actual change to LogToFile to increase the size of the overhead space we 
require to avoid a log switch.

2) All of the places in LogToFile which update the remembered endPostion have 
been reworked to call a brief method which sanity checks whether we have 
spilled past the end of the file already. I put in this instrumentation in 
order to find where the extra bytes were being written. I think it is a useful 
sanity check but I can remove it if people think that it will slow down log 
processing significantly.

3) I added a little more information to the original assertion and its friends 
in LogCounter.


Touches the following files:

----------

M      java/engine/org/apache/derby/impl/store/raw/log/LogAccessFile.java

Exposed the length of the checksum record so that LogToFile can accurately 
calculate its overhead.

----------

M      java/engine/org/apache/derby/impl/store/raw/log/LogCounter.java

Items (1) and (2) as described above.

----------

M      java/engine/org/apache/derby/impl/store/raw/log/LogToFile.java

Item (3) above.


> Assert during log file switch: log file position exceeded max log file size
> ---------------------------------------------------------------------------
>
>                 Key: DERBY-2254
>                 URL: https://issues.apache.org/jira/browse/DERBY-2254
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.3.1.4
>         Environment: Solaris 10, Java SE 6 build 104 
>            Reporter: Olav Sandstaa
>            Assignee: Rick Hillegas
>              Labels: derby_triage10_5_2
>         Attachments: d2254.zip, derby-2254-01-ab-accountForMoreOverhead.diff
>
>
> When running simple tpc-b like transactions against a embedded Derby based on 
> a SANE build of trunk the following assertion occurs for the background 
> thread and all user threads:
>    org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log 
> file position exceeded max log file size
> This seems to occur during a switch to a new log file.
> derby.log contains the following call stack for the background thread:
> Exception trace: 
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log file 
> position exceeded max log file size
>       at 
> org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:120)
>       at 
> org.apache.derby.impl.store.raw.log.LogCounter.makeLogInstantAsLong(LogCounter.java:120)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.switchLogFile(LogToFile.java:1900)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3530)
>       at 
> org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:345)
>       at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1185)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(LogToFile.java:1540)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(LogToFile.java:1357)
>       at 
> org.apache.derby.impl.store.raw.RawStore.checkpoint(RawStore.java:439)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.performWork(LogToFile.java:3416)
>       at 
> org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(BasicDaemon.java:331)
>       at 
> org.apache.derby.impl.services.daemon.BasicDaemon.work(BasicDaemon.java:668)
>       at 
> org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:394)
>       at java.lang.Thread.run(Thread.java:619)
> 2007-01-17 23:09:48.638 GMT Thread[derby.rawStoreDaemon,5,derby.daemons] 
> Cleanup action starting
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log file 
> position exceeded max log file size
>       at 
> org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:120)
>       at 
> org.apache.derby.impl.store.raw.log.LogCounter.makeLogInstantAsLong(LogCounter.java:120)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.switchLogFile(LogToFile.java:1900)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3530)
>       at 
> org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:345)
>       at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1185)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(LogToFile.java:1540)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(LogToFile.java:1357)
>       at 
> org.apache.derby.impl.store.raw.RawStore.checkpoint(RawStore.java:439)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.performWork(LogToFile.java:3416)
>       at 
> org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(BasicDaemon.java:331)
>       at 
> org.apache.derby.impl.services.daemon.BasicDaemon.work(BasicDaemon.java:668)
>       at 
> org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:394)
>       at java.lang.Thread.run(Thread.java:619)
> Cleanup action completed
> For my user threads the call stack is similar:
> Database Class Loader started - derby.database.classpath=''
> 2007-01-17 23:09:36.401 GMT Thread[Thread-51,5,main] (XID = 12632406), 
> (SESSIONID = 51), (DATABASE = /export/home/tmp/derby-db), (DRDAID = null), 
> Cleanup action starting
> 2007-01-17 23:09:36.401 GMT Thread[Thread-51,5,main] (XID = 12632406), 
> (SESSIONID = 51), (DATABASE = /export/home/tmp/derby-db), (DRDAID = null), 
> Failed Statement is: UPDATE accounts SET abal = abal + ? WHERE aid = ? AND 
> bid = ?
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED log file 
> position exceeded max log file size
>       at 
> org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:120)
>       at 
> org.apache.derby.impl.store.raw.log.LogCounter.makeLogInstantAsLong(LogCounter.java:120)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.switchLogFile(LogToFile.java:1900)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3530)
>       at 
> org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:345)
>       at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1185)
>       at 
> org.apache.derby.impl.store.raw.data.LoggableActions.doAction(LoggableActions.java:221)
>       at 
> org.apache.derby.impl.store.raw.data.LoggableActions.actionUpdate(LoggableActions.java:85)
>       at 
> org.apache.derby.impl.store.raw.data.StoredPage.doUpdateAtSlot(StoredPage.java:8521)
>       at 
> org.apache.derby.impl.store.raw.data.BasePage.updateAtSlot(BasePage.java:1108)
>       at 
> org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.replace(GenericConglomerateController.java:479)
>       at 
> org.apache.derby.impl.sql.execute.RowChangerImpl.updateRow(RowChangerImpl.java:523)
>       at 
> org.apache.derby.impl.sql.execute.UpdateResultSet.collectAffectedRows(UpdateResultSet.java:566)
>       at 
> org.apache.derby.impl.sql.execute.UpdateResultSet.open(UpdateResultSet.java:260)
>       at 
> org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:358)
>       at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1182)
>       at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1652)
>       at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:299)
>       at 
> com.sun.derby.perf.clients.tpcb.DBConnection.performTransaction(DBConnection.java:595)
>       at com.sun.derby.perf.clients.tpcb.Client.run(Client.java:218)
> After this it seems like no user threads are able to connect to the database 
> (not surpricing).
> This happend using a SANE build and I am unsure by just looking at the 
> assertion in the code whether this situation would have been fatal also if I 
> had been using an INSANE build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to