Because the online backup was taking a long time and effecting performance, and 
the customer's system was using the ZFS file system on Solaris.

I wrote a utility that does the following:


1.       Freezes the database

2.       Invokes a system command to perform a ZFS snapshot

3.       Unfreezes the database

4.       Creates a backup of the ZFS snapshot using 'tar' and 'compress'

5.       Removes the ZFS snapshot

The ZFS snapshot takes about 1 or 2 seconds so the time between step 1 and step 
3 is a couple of seconds.    The utility has checks to make sure that if step 1 
succeeds that it will do a step 3.   The basic logic looks like:

   private void run(String[] args) {
        parseArguments(args);
        loadDbDriver();
        final Connection conn = openDatabaseConnection();

        int res = 0;
        try {
            Thread shudownHook = new Thread() {
                @Override
                public void run() {
                    attemptToUnfreezeDatabase(conn);
                }
            };
            Runtime.getRuntime().addShutdownHook(shudownHook);
            freezeDatabase(conn);
            try {
                res = executeCopyCommand();
            } finally {
                unfreezeDatabase(conn);
                Runtime.getRuntime().removeShutdownHook(shudownHook);
            }
        } finally {
            closeDatabaseConnection(conn);
        }

        System.exit(res);
    }

So it registers a shutdown hook and also performs the system level command to 
perform the ZFS snapshot in a try/finally block, doing both to ensure that the 
unfreeze is done if the freeze was done.    This has been working really well 
each night for about 2 months but Saturday night something failed.

>From the stack traces of the Derby engine, it appears that something causes 
>the utility to fail after the database was frozen and neither the shutdown 
>hook nor the try/finally unfroze the database.   So after that point, the 
>database was effectively locked up.   The system was still operating and 
>connections were being made trying to access the  database exhausting all of 
>the connections.

So I was thinking that maybe the database engine should have some sort of 
protection if this were to happen.   Maybe the database engine should 
automatically unfreeze the database if the connection that freezes the database 
terminates/closes.   Or maybe a timer to be added to the freeze command to 
automatically unfreeze the database after the fact.

I am thinking this because I was told on a previous emailing when trying to 
build this utility totally from a script point of view using IJ to freeze the 
database, SH to perform the ZFS snapshot and IJ to unfreeze the database that 
it was not expected that the freeze/unfreeze would be done from separate 
connections.  I fact I ran into a problem with the utility at that point where 
the IJ connection to unfreeze could not be created because the database was 
frozen.

So I guess is there ever a use case that would require a database to be frozen 
and not unfrozen before the connection is closed/lost?


Reply via email to