Re: How to ensure I/O write order

David Boyes Tue, 29 Sep 2009 08:27:38 -0700

On 9/28/09 2:47 PM, "Hallock, Arthur T" <[email protected]> wrote:


> We have all VM and Linux volumes on a single physical controller EMC DMX-1000.
> On z/OS I run a batch job to execute an EMCSNAP utility. EMCSNAP allows me to
> identify the volumes to snap and to make a consistent (point-in-time) copy.
> The controller handles the I/O activity such that a point-in-time copy is made
> to the target volumes.

Here's the concern I raised: that utility (and the disk controller) has no
way to know what was done outside of what was actually committed to disk via
real I/O. It cannot know what is cached inside a virtual machine, possibly
on another LPAR or even another machine. You cannot get a consistant image
because the data isn't in reach of the EMCSNAP utility yet.

> We do perform database backups under the Linux OS. They can be used for local
> site recovery and are available at the DR site (because they are on the DASD
> that is mirrored and snapped/dumped to tape).

If you've dumped your databases from the Linux systems to tape or disk
controlled by the Linux system using the utility provided by the database
vendor to volumes that are NOT used by the production database, then snapped
those volumes, then your solution will probably work, in that the Linux
guest IS aware of what has/hasn't actually been written to disk, and can
compensate. If you initiate the snap from OUTSIDE the guest, see previous
comment. 
 
> Since VM and most of the servers are static (little I/O), I don't expect
> problems getting them started at the DR site.

VM, maybe, in that you can always do a cold start and run without saved
segments until you get time to recreate them.

For Linux, the "lightly loaded" aspect actually makes the problem worse in
that there is no pressure on the Linux guest to sync unwritten data to disk
to free up space. It'll get around to it eventually, but your exposure
window is somewhat longer if there's little activity.

> If DB/2 states it can automatically restart/recover after a
> server failure and reboot, then what is the difference between a failure
> (where the cache didn't get written) and a consistent snap?

See above. Consistent snap needs the disk controller to actually have had
access to the data in question. If it hasn't been actually passed to the
hardware to be written, then the disk controller has no way to know it's
there, and thus, can't duplicate it.

> I would think DB/2
> and Oracle would need to somehow compensate for how Linux caches the write
> I/Os. Else their claim to be able to restart/recover from a crash is somewhat
> misleading.

They're depending on the logs to do roll-forward. Lose those or have missing
parts, and you're toast, regardless of what platform you're on.

The compromise would be to do your backups using virtual networks to a Linux
guest that acts as a backup server using something like Amanda or Bacula.
Once you've done the backups, shut down ONLY the backup server guest and
snap that. As mentioned above, I'd probably take the chance on VM coming up
in a semi-usable form from a snap backup; it's pretty resilient to such bad
treatment, but YMMV. You have to decide how much risk you want to take. 

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: How to ensure I/O write order

Reply via email to