It is likely the database is corrupt now that you booted without letting recovery do its job. Just because there are no applications accessing the db it does not mean it is safe to delete the log files. Derby maintains an in memory cache of data that along with info in the recovery log files works to produce a consistent database. The only time all the data is consistent is after shutting down cleanly with
"shutdown=true" giving the system a chance to write out all it's dirty
cached data (basically it does a checkpoint before shutting down).

You should at least run the consistency checker across all tables to catch obvious corruptions, but even if this passes there may still be problems with
the database.

Did you happen to keep a copy of the problem db with its log files before corrupting it by deleting it's log files? There have been some tools posted to JIRA that can allow looking at what is going on in the log files. If not did you keep a listing of the files.
It would be interesting to at least see the names and sizes of the files
in the log directory.

Do you know if the customer uses 2 phase commit at all? This is a really easy way to keep a very old transaction in the log file. These can stay around after clients crash or exit if not properly committed or
aborted.



Bergquist, Brett wrote:
Thanks for taking the time to respond Knut.  It is much appreciated.

Some information:

The log files total 64Gb of disk space.  So this clearly went way past the 10Mb 
of transaction log.

So the " And the checkpoint will only delete log files older than the oldest 
transaction that's still alive."  That is important I think.   So if there was a 
stuck transaction somehow that occurred on July 12, for example, then this could in 
theory keep the transaction logs around until last night, correct?

Unfortunately the system administrator had already shutdown the database engine 
before he called me.  It would not boot the database in a reasonable time.  I 
was looking at the iostat and it looked like it was doing about 1Mbs and I used 
truss to take a look at the system calls and it was processing transaction log 
files from July 13'th after quite a while (a couple of hours of trying to boot 
the database).  I did a quick calculation and it looked like it would take 
somewhere around 17 hours to boot the database.

I looked at the last online backup that the customer had made and again, it had many thousands of transaction logs in the backup database, so that was not useful either.
I only had one option.  I knew the system was in a quiet state as there was no 
applications accessing the database.  I know this is not recommended but I had 
no choice but to remove the old transaction log files and boot the database.  
It came up okay and is in operation okay so I think it will be alright but it 
could possibly have corruption.  I had to take the chance however.

I am going to monitor the system and use the syscs_diag.transaction_table to 
query the transactions if I see this happen again.  Just a note however, 
something similar did happen a week ago and I looks at the transactions and it 
showed none even though there were thousands of transaction log files around.  
So a question, does the online backup show up as a transaction in the 
syscs_diag.transaction_table?  Also, a week ago, there was no locks as reported 
by the syscs_diag.lock_table (at least the snapshot of querying this table that 
I looked at).

Again if there is anything that anyone can think of that I should look at if I 
see this happen again, please shout out.

Brett

-----Original Message-----
From: Knut Anders Hatlen [mailto:knut.hat...@oracle.com] Sent: Tuesday, July 19, 2011 9:41 AM
To: derby-dev@db.apache.org
Subject: Re: Question on log directory of a derby database

"Bergquist, Brett" <bbergqu...@canoga.com> writes:

I have a database in production that has been running fine for a few
years. It started out having about 100K inserts per day into it and
now is up to about 4.6M inserts per day and this has been working
fine.

Tonight the customer called because the system was chewing up disk
space. I had the customer restart the database engine and it is taking
a long time to boot the database. I had the customer check the "log"
directory in the database and there were 62K ".dat" files present.

So I am assuming that these are for transactions that have not
committed, correct?

Yes, but they are not cleaned up until a checkpoint has run (by default
that happens when you have 10MB of transaction log), so they may contain
committed transactions too. And the checkpoint will only delete log
files older than the oldest transaction that's still alive.

But for the life of me, I cannot figure out what
transaction could have been in progress and not committed since July
12'th. It seems to me this would have exhausted memory or some other
resource by now.

One other point, an online database backup is done each night by the
customer. Could this trigger anything like this?

Yes, an ongoing online backup will prevent deletion of log files, since
it needs them to track modifications that happen while it copies the
database.

It could also happen if log archiving has been enabled (using the
SYSCS_BACKUP_DATABASE_AND_ENABLE_LOG_ARCHIVE_MODE procedure). You can
tell whether log archiving is enabled by looking for a line that says

  derby.storage.logArchiveMode=true

in the service.properties file in the database directory.

Tonight when running
a utility against the database, the utility failed to acquire locks,
but there should have been nothing else running but this utility and
it is single threaded, so there should have been no lock contention.
It also acts like there is a database backup that is still on going...

I don't think an online backup needs many locks. If you connect to the
database using ij and execute SELECT * FROM SYSCS_DIAG.LOCK_TABLE you
should see which locks are held, which might give some clues.

Right now, I am just waiting for the database to cleanup and boot so
that I can get in and examine it. Is there any shortcut or express way
to to boot the database? Is there any way to monitor the progress of
this boot cleanup?

I don't know of a way to speed it up. There is a flag that makes debug
builds print more info to derby.log during the recovery phase
(-Dderby.debug.true=LogTrace, I think), but it may be too low-level to
get much useful info in this case.

Any thoughts or pointers in trying to figure out what is going on will
be greatly appreciated.

The database in question is Derby 10.5.1

Brett


Reply via email to