On 9/28/09 2:47 PM, "Hallock, Arthur T" <[email protected]> wrote:
> We have all VM and Linux volumes on a single physical controller EMC DMX-1000. > On z/OS I run a batch job to execute an EMCSNAP utility. EMCSNAP allows me to > identify the volumes to snap and to make a consistent (point-in-time) copy. > The controller handles the I/O activity such that a point-in-time copy is made > to the target volumes. Here's the concern I raised: that utility (and the disk controller) has no way to know what was done outside of what was actually committed to disk via real I/O. It cannot know what is cached inside a virtual machine, possibly on another LPAR or even another machine. You cannot get a consistant image because the data isn't in reach of the EMCSNAP utility yet. > We do perform database backups under the Linux OS. They can be used for local > site recovery and are available at the DR site (because they are on the DASD > that is mirrored and snapped/dumped to tape). If you've dumped your databases from the Linux systems to tape or disk controlled by the Linux system using the utility provided by the database vendor to volumes that are NOT used by the production database, then snapped those volumes, then your solution will probably work, in that the Linux guest IS aware of what has/hasn't actually been written to disk, and can compensate. If you initiate the snap from OUTSIDE the guest, see previous comment. > Since VM and most of the servers are static (little I/O), I don't expect > problems getting them started at the DR site. VM, maybe, in that you can always do a cold start and run without saved segments until you get time to recreate them. For Linux, the "lightly loaded" aspect actually makes the problem worse in that there is no pressure on the Linux guest to sync unwritten data to disk to free up space. It'll get around to it eventually, but your exposure window is somewhat longer if there's little activity. > If DB/2 states it can automatically restart/recover after a > server failure and reboot, then what is the difference between a failure > (where the cache didn't get written) and a consistent snap? See above. Consistent snap needs the disk controller to actually have had access to the data in question. If it hasn't been actually passed to the hardware to be written, then the disk controller has no way to know it's there, and thus, can't duplicate it. > I would think DB/2 > and Oracle would need to somehow compensate for how Linux caches the write > I/Os. Else their claim to be able to restart/recover from a crash is somewhat > misleading. They're depending on the logs to do roll-forward. Lose those or have missing parts, and you're toast, regardless of what platform you're on. The compromise would be to do your backups using virtual networks to a Linux guest that acts as a backup server using something like Amanda or Bacula. Once you've done the backups, shut down ONLY the backup server guest and snap that. As mentioned above, I'd probably take the chance on VM coming up in a semi-usable form from a snap backup; it's pretty resilient to such bad treatment, but YMMV. You have to decide how much risk you want to take. ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
