Hi Dibyendu,
Nice writeup. I think comments are out of date in couple of cases:
>>Derby implements the Write Ahead Log using a non-circular file system file. At present, there is no support for incremental log backup or media recovery. Only crash recovery is supported.
I think derby does support simple media recovery. It has support for full backup/restore and very basic form of rollforward recovery (replay of logs using backup and archived log files).
>>Everytime a checkpoint is taken, a new log file is created and all subsequent log records will go to the new log file. After a checkpoint is taken, old and useless log files will be deleted.
I thing log switch does not happen always on a checkpoint; with the default values log switch happens when a log file grows beyond 1MB and
a checkpoint happens when the amount of log written is 10MB or more from the last checkpoint.
LogToFile.java:
private static final int DEFAULT_LOG_SWITCH_INTERVAL = 1024*1024; private static final int CHECKPOINT_INTERVAL_MAX = 128*1024*1024;
I >> 1. The log file grows beyond a certain size (configurable, default 100K bytes)
Looks like the comments are out of date. Default is 1MB.
LogToFile.java : private static final int DEFAULT_LOG_SWITCH_INTERVAL = 1024*1024;
Thanks -suresht
Dibyendu Majumdar wrote:
From: "Jean T. Anderson" <[EMAIL PROTECTED]>
Dibyendu Majumdar's doc on "Derby On Disk Page Format" is now live at http://incubator.apache.org/derby/papers/pageformats.html.
Thanks. Attached is another document :-)
Regards
------------------------------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
<document> <header> <title>Derby Write Ahead Log Format</title>
<abstract>This document describes the storage format of Derby Write Ahead Log. This is a work-in-progress derived from Javadoc comments and from explanations Mike Matrigali posted to the Derby lists. Please post questions, comments, and corrections to [EMAIL PROTECTED]
</abstract>
</header>
<body>
<section id="introduction"> <title> Introduction </title>
<p>
Derby implements the Write Ahead Log using a non-circular file system file.
At present, there is no support for incremental log backup or media recovery. Only crash recovery is supported. </p>
<p>
The 'log' is a stream of log records. The 'log' is implemented as
a series of numbered log files. These numbered log files are logically
continuous so a transaction can have log records that span multiple log files.
A single log record cannot span more then one log file. The log file number
is monotonically increasing.
</p>
<p>
The log belongs to a log factory of a RawStore. In the current implementation,
each RawStore only has one log factory, so each RawStore only has one log
(which composed of multiple log files).
At any given time, a log factory only writes new log records to one log file,
this log file is called the 'current log file'.
</p>
<p>
A log file is named log<em>logNumber</em>.dat
</p>
<p>
Everytime a checkpoint is taken, a new log file is created and all subsequent
log records will go to the new log file. After a checkpoint is taken, old
and useless log files will be deleted.
</p>
<p>
RawStore exposes a checkpoint method which clients can call, or a checkpoint is
taken automatically by the RawStore when:
</p>
<ol>
<li> The log file grows beyond a certain size (configurable, default 100K bytes)</li>
<li> RawStore is shutdown and a checkpoint hasn't been done "for a while"</li>
<li> RawStore is recovered and a checkpoint hasn't been done "for a while"</li>
</ol>
</section>
<section>
<title>
Format of Write Ahead Log
</title>
<p>
An implementation of file based log is <code>org.apache.derby.impl.store.raw.log.LogToFile</code>.
This LogFactory is responsible for the formats of 2 kinds of file: the log
file and the log control file. And it is responsible for the format of the
log record wrapper.
</p>
<section>
<title>Format of Log Control File</title>
<p>The log control file contains information about which log files
are present and where the last checkpoint log record is located.</p>
<table>
<tr>
<th>Type</th>
<th>Desciption</th>
</tr>
<tr>
<td>int</td>
<td>format id set to FILE_STREAM_LOG_FILE</td>
</tr>
<tr>
<td>int</td>
<td>obsolete log file version</td>
</tr>
<tr>
<td>long</td>
<td>the log instant (LogCounter) of the last completed checkpoint</td>
</tr>
<tr>
<td>int</td>
<td>JBMS (older name for Cloudscape/Derby) version</td>
</tr>
<tr>
<td>int</td>
<td>checkpoint interval</td>
</tr>
<tr>
<td>long</td>
<td>spare (value set to 0)</td>
</tr>
<tr>
<td>long</td>
<td>spare (value set to 0)</td>
</tr>
<tr>
<td>long</td>
<td>spare (value set to 0)</td>
</tr>
</table>
</section>
<section>
<title>Format of the log file</title>
<p>The log file contains log records which record all the changes
to the database. The complete transaction log is composed of a series of
log files.</p>
<table>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
<tr>
<td>int</td>
<td>Format id of this log file, set to FILE_STREAM_LOG_FILE.</td>
</tr>
<tr>
<td>int</td>
<td>Obsolete log file version - not used</td>
</tr>
<tr>
<td>long</td>
<td>Log file number - this number orders the log files in a
series to form the complete transaction log
</td>
</tr> <tr>
<td>long</td>
<td>PrevLogRecord - log instant of the previous log record, in the
previous log file.</td>
</tr>
<tr>
<td>[log record wrapper]*</td>
<td>one or more log records with wrapper</td>
</tr>
<tr>
<td>int</td>
<td>EndMarker - value of zero. The beginning of a log record wrapper
is the length of the log record, therefore it is never zero
</td>
</tr>
<tr>
<td>[int fuzzy end]*</td>
<td>zero or more int's of value 0, in case this log file
has been recovered and any incomplete log record set to zero.
</td>
</tr>
</table>
</section>
<section>
<title>Format of the log record wrapper</title>
<p>The log record wrapper provides information for the log scan.</p>
<table>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
<tr>
<td>int</td>
<td>length - length of the log record (for forward scan)</td>
</tr>
<tr>
<td>long</td>
<td>instant - LogInstant of the log record</td>
</tr>
<tr>
<td>byte[length]</td>
<td>logRecord - byte array that is written by the FileLogger</td>
</tr>
<tr>
<td>int</td>
<td>length - length of the log record (for backward scan)</td>
</tr>
</table>
</section>
<section>
<title>The format of a log record</title>
<p>The log record described every change to the persistent store</p>
<table>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
<tr>
<td>int</td>
<td>format_id, set to LOG_RECORD. The formatId is written by FormatIdOutputStream when this object is written out by writeObject
</td>
</tr>
<tr>
<td>CompressedInt</td>
<td><p>loggable group - the loggable's group value.</p>
<p>
Each loggable belongs to one or more groups of similar functionality.
</p>
<p>
Grouping is a way to quickly sort out log records that are interesting
to different modules or different implementations.
</p>
<p>
When a module makes loggable and sent it to the log file, it must mark
this loggable with one or more of the following group. If none fit, or if the loggable encompasses functionality that is not
described in existing groups, then a new group should be introduced. </p>
<p>
Grouping has no effect on how the record is logged or how it is treated
in rollback or recovery.
</p>
<p>
The following groups are defined. This list serves as the registry of
all loggable groups.
</p>
<table>
<caption>Loggable Groups</caption>
<tr>
<th>Name</th>
<th>Value</th>
<th>Description</th>
</tr>
<tr>
<td>FIRST</td>
<td>0x1</td>
<td>The first operation of a transaction.</td>
</tr>
<tr>
<td>LAST</td>
<td>0x2</td>
<td>The last operation of a transaction.</td>
</tr>
<tr>
<td>COMPENSATION</td>
<td>0x4</td>
<td>A compensation log record.</td>
</tr>
<tr>
<td>BI_LOG</td>
<td>0x8</td>
<td>A BeforeImage log record.</td>
</tr>
<tr>
<td>COMMIT</td>
<td>0x10</td>
<td>The transaction committed.</td>
</tr>
<tr>
<td>ABORT</td>
<td>0x20</td>
<td>The transaction aborted.</td>
</tr>
<tr>
<td>PREPARE</td>
<td>0x40</td>
<td>The transaction prepared.</td>
</tr>
<tr>
<td>XA_NEEDLOCK</td>
<td>0x80</td>
<td>Need to reclaim locks associated with theis log record during XA prepared xact recovery.</td>
</tr>
<tr>
<td>RAWSTORE</td>
<td>0x100</td>
<td>A log record generated by the raw store.</td>
</tr>
<tr>
<td>FILE_RESOURCE</td>
<td>0x400</td>
<td>related to "non-transactional" files.</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>TransactionId</td>
<td>xactId - The Transaction this log belongs to.</td>
</tr>
<tr>
<td>Loggable</td>
<td>op - the log operation</td>
</tr>
</table>
</section>
</section>
</body>
<footer> <legal>
</legal>
</footer>
</document>
