We are doing a “truss -p <pid> 2>&1 | grep open” but what I am seeing right now
is that one complete pass was made through the logs and I thought that it was
about done. Now it is going into a pattern where it processes something like
logxxxx359.dat
logxxxx358.dat
…
logxxxx242.dat
It is updating some “seg0” files (either re-doing or undoing) and then
sometimes creates “logxxxx360.dat” (ie. the next number) and then starts the
process over again. Been doing recovery now for over 5 hours. Ouch indeed.
Restoring from a backup is about a 5 hour process as well as the database is
about 550G. Just un-taring the backup file takes a substantial amount of time.
The problem occurred when an external system all of a sudden issued 128
concurrent requests to our server where each request removes 10’s of thousands
of records in a transaction. These all caused a gridlock on locking, lock
timeouts, the cleanup messages being output to derby-?.log (rolling log
support), etc. The system administrator forcefully shutdown Derby and
restarted.
I know the deletion of 10’s of thousands of records in a transaction is bad
design. It did not start out at this scale (a few hundred records to start)
but grew into this problem. The item being deleted represents a piece of
network equipment and we can’t have a partial piece of equipment laying around
in the database. Shortly this will be re-designed to mark as deleted and then
perform the deletion in the background at little chunks at a time, but of
course the issue arose before the solution is complete.
So what would be useful would be something like:
Performing database recovery
Starting analysis pass
215 transactions detected to be processed
Starting redo pass
…. anything that could give some feedback
Starting undo pass
….anything that could give some feedback
> On Apr 20, 2016, at 12:10 AM, Bryan Pendleton <[email protected]>
> wrote:
>
>> Another issue with about 1100 log files needing to be
>> processed after a restart of the database network server.
>
> Ouch. :(
>
>> is any logging that can tell when pass the database recovery is on
>
> Perhaps use an operating system level monitoring tool (Process Monitor
> on Windows, strace on Linux, etc.) to see if you can watch the store
> opening and accessing each log file.
>
> I'm supposing that the recovery processing essentially reads those
> logs sequentially, so if you watch it for a while you can see how
> long it takes to finish one log and move on to the next?
>
> bryan
>
Canoga Perkins
20600 Prairie Street
Chatsworth, CA 91311
(818) 718-6300
This e-mail and any attached document(s) is confidential and is intended only
for the review of the party to whom it is addressed. If you have received this
transmission in error, please notify the sender immediately and discard the
original message and any attachment(s).