Re: [zfs-discuss] Scenario sanity check

Brian Wilson Mon, 09 Jul 2012 10:27:09 -0700


On 07/06/12, Richard Elling  wrote:




First things first, the panic is a bug. Please file one with your OS 
supplier.More below...


Thanks! It helps that it recurred a second night in a row.


On Jul 6, 2012, at 4:55 PM, Ian Collins wrote:


> On 07/ 7/12 11:29 AM, Brian Wilson wrote:
>
> > On 07/ 6/12 04:17 PM, Ian Collins wrote:
> >
>
> >
> > > On 07/ 7/12 08:34 AM, Brian Wilson wrote:
> > >
> >
>
> >
> > >
> > > > Hello,
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > I'd like a sanity check from people more knowledgeable than myself.
> > > >
> > >
> >
>
> >
> > >
> > > > I'm managing backups on a production system. Previously I was using
> > > >
> > >
> >
>
> >
> > >
> > > > another volume manager and filesystem on Solaris, and I've just switched
> > > >
> > >
> >
>
> >
> > >
> > > > to using ZFS.
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > My model is -
> > > >
> > >
> >
>
> >
> > >
> > > > Production Server A
> > > >
> > >
> >
>
> >
> > >
> > > > Test Server B
> > > >
> > >
> >
>
> >
> > >
> > > > Mirrored storage arrays (HDS TruCopy if it matters)
> > > >
> > >
> >
>
> >
> > >
> > > > Backup software (TSM)
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A sees the live volumes.
> > > >
> > >
> >
>
> >
> > >
> > > > Test Server B sees the TruCopy mirrors of the live volumes. (it sees
> > > >
> > >
> >
>
> >
> > >
> > > > the second storage array, the production server sees the primary array)
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A shuts down zone C, and exports the zpools for
> > > >
> > >
> >
>
> >
> > >
> > > > zone C.
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A splits the mirror to secondary storage array,
> > > >
> > >
> >
>
> >
> > >
> > > > leaving the mirror writable.
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A re-imports the pools for zone C, and boots zone C.
> > > >
> > >
> >
>
> >
> > >
> > > > Test Server B imports the ZFS pool using -R /backup.
> > > >
> > >
> >
>
> >
> > >
> > > > Backup software backs up the mounted mirror volumes on Test Server B.
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > Later in the day after the backups finish, a script exports the ZFS
> > > >
> > >
> >
>
> >
> > >
> > > > pools on test server B, and re-establishes the TruCopy mirror between
> > > >
> > >
> >
>
> >
> > >
> > > > the storage arrays.
> > > >
> > >
> >
>
> >
> > > That looks awfully complicated. Why don't you just clone a snapshot
> > >
> >
>
> >
> > > and back up the clone?
> > >
> >
>
> >
> > >
> > >
> >
>
> > Taking a snapshot and cloning incurs IO. Backing up the clone incurs a
> >
>
> > lot more IO reading off the disks and going over the network. These
> >
>
> > aren't acceptable costs in my situation.
> >
>
>


Yet it is acceptable to shut down the zones and export the pools?
I'm interested to understand how a service outage is preferred over I/O?


> So splitting a mirror and reconnecting it doesn't incur I/O?
>
>


It does.


>
> > The solution is complicated if you're starting from scratch. I'm
> >
>
> > working in an environment that already had all the pieces in place
> >
>
> > (offsite synchronous mirroring, a test server to mount stuff up on,
> >
>
> > scripts that automated the storage array mirror management, etc). It
> >
>
> > was setup that way specifically to accomplish short downtime outages for
> >
>
> > cold backups with minimal or no IO hit to production. So while it's
> >
>
> > complicated, when it was put together it was also the most obvious thing
> >
>
> > to do to drop my backup window to almost nothing, and keep all the IO
> >
>
> > from the backup from impacting production. And like I said, with a
> >
>
> > different volume manager, it's been rock solid for years.
> >
>
>


... where data corruption is blissfully ignored? I'm not sure what volume
manager you were using, but SVM has absolutely zero data integrity
checking :-( And no, we do not miss using SVM :-)

I was trying to avoid sounding like a brand snob ('my old volume managerdid X, why doesn't ZFS?'), because that's truely not my attitude, Iprefer ZFS. I was using VxVM and VxFS - still no integrity checking, Iagree :-)



>
> > So, to ask the sanity check more specifically -
> >
>
> > Is it reasonable to expect ZFS pools to be exported, have their luns
> >
>
> > change underneath, then later import the same pool on those changed
> >
>
> > drives again?
> >
>
>


Yes, we do this quite frequently. And it is tested ad nauseum. Methinks it is
simply a bug, perhaps one that is already fixed.

Excellent, that's exactly what I was hoping to hear. Thank you!


> If you were splitting ZFS mirrors to read data from one half all would be 
sweet (and you wouldn't have to export the pool). I guess the question here is 
what does TruCopy do under the hood when you re-connect the mirror?
>
>


Yes, this is one of the use cases for zpool split. However, zpool split creates 
a new
pool, which is not what Brian wants, because to reattach the disks requires a 
full resilver.
Using TrueCopy as he does, is a reasonable approach for Brian's use case.
-- richard

Yep, thanks, and to answer Ian with more detail on what TruCopy does.TruCopy mirrors between the two storage arrays, with software running onthe arrays, and keeps a list of dirty/changed 'tracks' while the mirroris split. I think they call it something other than 'tracks' for HDS,but, whatever. When it resyncs the mirrors it sets the target lunsread-only (which is why I export the zpools first), and the source arrayreads the changed tracks, and writes them across dedicated mirror portsand fibre links to the target array's dedicated mirror ports, which thenbrings the target luns up to synchronized. So, yes, like Richard says,there is IO, but it's isolated to the arrays, and it's scheduled aslower priority on the source array than production traffic. For exampleit can take an hour or more to re-synchronize a particularly busy 250 GBlun. (though you can do more than one at a time without it taking longeror impacting production any more unless you choke the mirror links,which we do our best not to do) That lower priority, dedicated ports onthe arrays, etc, all makes the noticaeble impact on the productionstorage luns from the production server as un-noticable as I can make itin my environment.


Thanks again! Off to file a bug...

Brian



--
ZFS Performance and Training
richard.ell...@richardelling.com <richard.ell...@richardelling.com>
+1-760-896-4422


--
--




-----------------------------------------------------------------------------------

Brian Wilson, Solaris SE, UW-Madison DoIT

Room 3114 CS&S 608-263-8047

brian.wilson(a)doit.wisc.edu

'I try to save a life a day. Usually it's my own.' - John Crichton

-----------------------------------------------------------------------------------

--
-----------------------------------------------------------------------------------
Brian Wilson, Solaris SE, UW-Madison DoIT
Room 3114 CS&S            608-263-8047
brian.wilson(a)doit.wisc.edu
'I try to save a life a day. Usually it's my own.' - John Crichton
-----------------------------------------------------------------------------------
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Scenario sanity check

Reply via email to