RE: [DB2EUG] Re. Backup/Recovery Strategy for EEE VLDB

Reys, Ellen Mon, 24 Feb 2003 12:34:10 -0800

Frank,

You drew a very pessimistic picture :( 
It looks like the problem you have encountered came from


1) taking online full database backup
2) trying to rollforward before attempting to drop NONRECOVERABLE
tablespaces

We are planning on take an off-line backup. We do have on-line table spaces
backup, though. 

My understanding is that: if your loads had NONRECOVERABLE option in their
control files, you should not have even planned to recover these table
spaces, rather you should have dropped them prior to doing "rollforward"
after the restore completed, then rollforward, and then re-create and
re-load the dropped tablespaces/tables after the rollforward. Am I missing
something here?

Ellen Klebaner-Reys
Data Management Services
Inovant - a Visa Solutions Company
[EMAIL PROTECTED]/650-432-1746  m/s: 3125-1D



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Monday, February 24, 2003 4:30 AM
To: Reys, Ellen
Cc: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]';
[EMAIL PROTECTED]
Subject: Re: [DB2EUG] Re. Backup/Recovery Strategy for EEE VLDB



Your EEE environment sounds very similar to ours with all the same issues.
We found a very painful shortcoming in DB2 AIX, (little DB2 as I call it)
that caused us to be unable to recover some of our tables when a LOAD went
awry and the applications people did not know the status of many of the
tables in the warehouse.  We found that there is a feature in little DB2
that if you recover from an online backup  when you preform a roll forward
recovery even if it is a minute after the copy occurred you have the
potential to be sent back in time on the log to before the nonrecoverable
load occurred.  What happens is that a dirty page somehow stays in memory
for a very long time due to low logging activity. In our case it was 2 days
prior.  We too use many nonrecoverable loads so logging is pretty minimal.

The roll forward went back to the log where that page was changed and
rolled forward.  When it hit the nonrecoverable load on certain fact tables
DB2 marked the tables inconsistent. Even though the data in the tables
after the recovery was perfectly fine. As there is no force start like
mainframe (Big DB2) has. and we do not, YET, have DB2 UNLOAD tool so we
could not unload from the backup and the load the data into the table we
had no options. We were told by IBM that our only option was to drop and
recreate the table and recreate all the data.  Needless to say we were
appalled.  A DBA's worst nightmare had occurred.  We had to go to our users
and tell them they had to rebuild their data as we could not recover the
tables.

We have had several discussion with developers in Toronto and have
discovered that when little DB2 takes a soft checkpoint and moves the
header pointer forward in the log is not an easily controlled event by
using commands.  There are parms  SOFTMAX and CHNGPGF_THRES (sp?)  that
determine when a soft checkpoint occurs, but in large warehouses with huge
logs and huge buffers even set to their lowest values there is still this
exposure.  Even if a force  archive command is issued this soft checkpoint
is NOT taken.  The ONLY solution is to break all threads and close the
Database before an backup is done.  By closing the Database all dirty pages
with committed data are externalized to the disk.  IBM's suggestion was the
only way to avoid this situation is to always take an off-line copy after
every load.  A very bad answer in our opinion.

After discussing the issue with IBM the developers have accepted a PMR that
will result in  change to the DB2 code so that when an ARCHIVE LOG command
is issued , that command will also cause a soft checkpoint to be taken.
Until this patch is made available you do have this exposure and it is
advisable to close the Database and take an off-line copy (in our case) at
least once a day.  This news certainly has upset our applecart as this was
not part of the plan.  We too have many MARTS and FACT tables that are
loaded at different times in the day.  We have decided that until we have
this fix,  every night or rather early every morning we will close the
Database and take an off-line copy, not our first choice and even this
solution has potential for some degree of exposure.

Hopefully IBM will move quickly with a solution to this problem.

Frank



 

                      "Reys, Ellen"

                      <[EMAIL PROTECTED]        To:
"'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>

                      m>                       cc:
"'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>

                      Sent by:                 Subject:  [DB2EUG] Re.
Backup/Recovery Strategy for EEE VLDB                            
                      [EMAIL PROTECTED]

                      ash.org

 

 

                      02/24/2003 03:04

                      AM

                      Please respond to

                      "Reys, Ellen"

 

 





Hello,

This goal of this email is to solicit suggestions to correct the gap in our
current backup/recovery strategy and to implement a new (possibly better)
strategy for the future environment - both represent 24 x 7 large "active"
warehouse under the DB2 UDB EEE. "Active" as the new industry definition
means that it encompasses multiple data marts and its processing covers
much
more than complex reports, i.e. it accommodates daily Java based "inserts"
with some "updates", daily autoloader jobs, daily and monthly Cognos cubes
building processes, monthly and quarterly purging via "delete", etc.

Current Processing Environment:

RS/6000-S85
             24 CPU
             56 GB
ESS Storage (2 ESS)
             6.2 TB total allocated
             ~4.5 TB (data + indexes)
AIX 4.3.3
DB2 UDB EEE 7.2 FixPak4
             17 logical nodes (16 data and 1 catalog)

12-15 MSTR complex reports (MicroStrategy queries)
2 Cognos Transformers
2 Autoloaders
Hourly up to 10 Java based insert jobs (with a few updates)
----------------------------------------------------------------------------

---------------------------------------------

NOTE: though officially we have weekly maintenance window every Sunday, we
can only guarantee that queries will not run. The jobs have so many
dependencies on the files that arrive from MVS that it is almost impossible
to automate the time for maintenance window. However, we will attempt to do
so.

Backup/Recovery Problem:

Our backup strategy consists of critical tablespace level online backups,
starting with the catalog tablespace. It has been decided during the
initial
database design stage that due to budget constraint, the rest of the
tablespaces can be re-done, and since the data files are available, by
using
autoloader tables can be easily re-loaded. We have also all the archive
logs
since the activation of the log retain and user exit parameters. To turn
the
logging ON, we fooled the database by taking a full database backup to
/dev/null. Also, our tablespaces that participate in the autoloader, have
NONRECOVERABLE option in their control files to prevent "backup pending"
mode. Our backup strategy without having a full database backup image means
that we will be able to recover critical tablespaces, but will not be able
to recover a database in case of disaster scenario.

Our preliminary backup plan to correct the existing gap is as following:

once a month full off-line database backup
weekly tablespace backup of the same critical tablespaces
keep the archive logs from the last successful full database backup

For example:

Backup

full database backup off-line (~4.5 TB)  Sunday 3/02/03
Catalog and critical tablespaces on-line backup Sunday 3/09/03; 3/16/03;
3/23/03; 3/30/03
Keep the archived logs from after the db backup on 3/02/03
Next full database backup off-line 4/06/03

Recovery

Assuming, that the database crashed on 3/25 and needs to be recovered
Is this would be the best plan of actions?

1) Restore the database from the previous good full db backup image
(3/02/03)
2) Rollforward to end of logs

OR

1) Restore the database from the previous good full db backup image
(3/02/03)
2) Restore tablespaces from the last good images (3/23/03)
2) Rollforward ?

I am sure that there are some UDB users who have implemented
backup/recovery
strategies for VLDB. I am writing to solicit your expert advice.

Because of this exposure and a very tight schedule to correct it, I would
like to limit our testing to only those scenarios that are 1) fairly easy
to
implement; 2) already known as "working" in similar environments
===========================================================

I am also looking for any ideas/suggestions about the future
backup/recovery
strategy, which may not be limited to the database utilities only, which
will be implemented in the following environment:

Future Processing Environment:

1 P690 divided into 2 LPARs, with total of 32 CPU and 64 GB of memory
             3 ESS
             15 or more TB of total allocated disk
AIX 5.1 (possibly 5.2 by the end of Oct. 2003)
DB2 UDB EEE 7.2 FixPak8 initially (V. 8 64-bit possibly by the end of Oct.
2003)

I understand that it is always a trade-off between cost and best solution.
That is why I would like to consider and prepare multiple options for our
customers.

Thanks in advance for your help!

Ellen Klebaner-Reys
Data Management Services
Inovant - a Visa Solutions Company
[EMAIL PROTECTED]/650-432-1746  m/s: 3125-1D


-
:::  When replying to the list, please use 'Reply-All' and make sure
:::  a copy goes to the list ([EMAIL PROTECTED]).
***  To unsubscribe, send 'unsubscribe' to [EMAIL PROTECTED]
***  For more information, check http://www.db2eug.uni.cc




-
:::  When replying to the list, please use 'Reply-All' and make sure
:::  a copy goes to the list ([EMAIL PROTECTED]).
***  To unsubscribe, send 'unsubscribe' to [EMAIL PROTECTED]
***  For more information, check http://www.db2eug.uni.cc

RE: [DB2EUG] Re. Backup/Recovery Strategy for EEE VLDB

Reply via email to