Frank, You drew a very pessimistic picture :( It looks like the problem you have encountered came from
1) taking online full database backup 2) trying to rollforward before attempting to drop NONRECOVERABLE tablespaces We are planning on take an off-line backup. We do have on-line table spaces backup, though. My understanding is that: if your loads had NONRECOVERABLE option in their control files, you should not have even planned to recover these table spaces, rather you should have dropped them prior to doing "rollforward" after the restore completed, then rollforward, and then re-create and re-load the dropped tablespaces/tables after the rollforward. Am I missing something here? Ellen Klebaner-Reys Data Management Services Inovant - a Visa Solutions Company [EMAIL PROTECTED]/650-432-1746 m/s: 3125-1D -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, February 24, 2003 4:30 AM To: Reys, Ellen Cc: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]'; [EMAIL PROTECTED] Subject: Re: [DB2EUG] Re. Backup/Recovery Strategy for EEE VLDB Your EEE environment sounds very similar to ours with all the same issues. We found a very painful shortcoming in DB2 AIX, (little DB2 as I call it) that caused us to be unable to recover some of our tables when a LOAD went awry and the applications people did not know the status of many of the tables in the warehouse. We found that there is a feature in little DB2 that if you recover from an online backup when you preform a roll forward recovery even if it is a minute after the copy occurred you have the potential to be sent back in time on the log to before the nonrecoverable load occurred. What happens is that a dirty page somehow stays in memory for a very long time due to low logging activity. In our case it was 2 days prior. We too use many nonrecoverable loads so logging is pretty minimal. The roll forward went back to the log where that page was changed and rolled forward. When it hit the nonrecoverable load on certain fact tables DB2 marked the tables inconsistent. Even though the data in the tables after the recovery was perfectly fine. As there is no force start like mainframe (Big DB2) has. and we do not, YET, have DB2 UNLOAD tool so we could not unload from the backup and the load the data into the table we had no options. We were told by IBM that our only option was to drop and recreate the table and recreate all the data. Needless to say we were appalled. A DBA's worst nightmare had occurred. We had to go to our users and tell them they had to rebuild their data as we could not recover the tables. We have had several discussion with developers in Toronto and have discovered that when little DB2 takes a soft checkpoint and moves the header pointer forward in the log is not an easily controlled event by using commands. There are parms SOFTMAX and CHNGPGF_THRES (sp?) that determine when a soft checkpoint occurs, but in large warehouses with huge logs and huge buffers even set to their lowest values there is still this exposure. Even if a force archive command is issued this soft checkpoint is NOT taken. The ONLY solution is to break all threads and close the Database before an backup is done. By closing the Database all dirty pages with committed data are externalized to the disk. IBM's suggestion was the only way to avoid this situation is to always take an off-line copy after every load. A very bad answer in our opinion. After discussing the issue with IBM the developers have accepted a PMR that will result in change to the DB2 code so that when an ARCHIVE LOG command is issued , that command will also cause a soft checkpoint to be taken. Until this patch is made available you do have this exposure and it is advisable to close the Database and take an off-line copy (in our case) at least once a day. This news certainly has upset our applecart as this was not part of the plan. We too have many MARTS and FACT tables that are loaded at different times in the day. We have decided that until we have this fix, every night or rather early every morning we will close the Database and take an off-line copy, not our first choice and even this solution has potential for some degree of exposure. Hopefully IBM will move quickly with a solution to this problem. Frank "Reys, Ellen" <[EMAIL PROTECTED] To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> m> cc: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> Sent by: Subject: [DB2EUG] Re. Backup/Recovery Strategy for EEE VLDB [EMAIL PROTECTED] ash.org 02/24/2003 03:04 AM Please respond to "Reys, Ellen" Hello, This goal of this email is to solicit suggestions to correct the gap in our current backup/recovery strategy and to implement a new (possibly better) strategy for the future environment - both represent 24 x 7 large "active" warehouse under the DB2 UDB EEE. "Active" as the new industry definition means that it encompasses multiple data marts and its processing covers much more than complex reports, i.e. it accommodates daily Java based "inserts" with some "updates", daily autoloader jobs, daily and monthly Cognos cubes building processes, monthly and quarterly purging via "delete", etc. Current Processing Environment: RS/6000-S85 24 CPU 56 GB ESS Storage (2 ESS) 6.2 TB total allocated ~4.5 TB (data + indexes) AIX 4.3.3 DB2 UDB EEE 7.2 FixPak4 17 logical nodes (16 data and 1 catalog) 12-15 MSTR complex reports (MicroStrategy queries) 2 Cognos Transformers 2 Autoloaders Hourly up to 10 Java based insert jobs (with a few updates) ---------------------------------------------------------------------------- --------------------------------------------- NOTE: though officially we have weekly maintenance window every Sunday, we can only guarantee that queries will not run. The jobs have so many dependencies on the files that arrive from MVS that it is almost impossible to automate the time for maintenance window. However, we will attempt to do so. Backup/Recovery Problem: Our backup strategy consists of critical tablespace level online backups, starting with the catalog tablespace. It has been decided during the initial database design stage that due to budget constraint, the rest of the tablespaces can be re-done, and since the data files are available, by using autoloader tables can be easily re-loaded. We have also all the archive logs since the activation of the log retain and user exit parameters. To turn the logging ON, we fooled the database by taking a full database backup to /dev/null. Also, our tablespaces that participate in the autoloader, have NONRECOVERABLE option in their control files to prevent "backup pending" mode. Our backup strategy without having a full database backup image means that we will be able to recover critical tablespaces, but will not be able to recover a database in case of disaster scenario. Our preliminary backup plan to correct the existing gap is as following: once a month full off-line database backup weekly tablespace backup of the same critical tablespaces keep the archive logs from the last successful full database backup For example: Backup full database backup off-line (~4.5 TB) Sunday 3/02/03 Catalog and critical tablespaces on-line backup Sunday 3/09/03; 3/16/03; 3/23/03; 3/30/03 Keep the archived logs from after the db backup on 3/02/03 Next full database backup off-line 4/06/03 Recovery Assuming, that the database crashed on 3/25 and needs to be recovered Is this would be the best plan of actions? 1) Restore the database from the previous good full db backup image (3/02/03) 2) Rollforward to end of logs OR 1) Restore the database from the previous good full db backup image (3/02/03) 2) Restore tablespaces from the last good images (3/23/03) 2) Rollforward ? I am sure that there are some UDB users who have implemented backup/recovery strategies for VLDB. I am writing to solicit your expert advice. Because of this exposure and a very tight schedule to correct it, I would like to limit our testing to only those scenarios that are 1) fairly easy to implement; 2) already known as "working" in similar environments =========================================================== I am also looking for any ideas/suggestions about the future backup/recovery strategy, which may not be limited to the database utilities only, which will be implemented in the following environment: Future Processing Environment: 1 P690 divided into 2 LPARs, with total of 32 CPU and 64 GB of memory 3 ESS 15 or more TB of total allocated disk AIX 5.1 (possibly 5.2 by the end of Oct. 2003) DB2 UDB EEE 7.2 FixPak8 initially (V. 8 64-bit possibly by the end of Oct. 2003) I understand that it is always a trade-off between cost and best solution. That is why I would like to consider and prepare multiple options for our customers. Thanks in advance for your help! Ellen Klebaner-Reys Data Management Services Inovant - a Visa Solutions Company [EMAIL PROTECTED]/650-432-1746 m/s: 3125-1D - ::: When replying to the list, please use 'Reply-All' and make sure ::: a copy goes to the list ([EMAIL PROTECTED]). *** To unsubscribe, send 'unsubscribe' to [EMAIL PROTECTED] *** For more information, check http://www.db2eug.uni.cc - ::: When replying to the list, please use 'Reply-All' and make sure ::: a copy goes to the list ([EMAIL PROTECTED]). *** To unsubscribe, send 'unsubscribe' to [EMAIL PROTECTED] *** For more information, check http://www.db2eug.uni.cc
