[ha-clusters-discuss] HORCM question

Lisa Shepherd Wed, 08 Jul 2009 08:39:36 -0700

Someone please open a doc CR to request that this information be added 
to the replication chapter of the Sun Cluster System Administration 
Guide. Sergei's suggestion is good to describe the problem and what 
action to take, but refer users to the TrueCopy documentation for the 
actual procedures to perform that action.


Thanks.

Lisa Shepherd
Sun Cluster Technical Publications
"We're the M in RTFM"



Sergei Kolodka wrote:
> Stephen, thanks for your definitive answer, I'm hoping I can get same 
> definitive answer from Sun support for logged call, unless it'll be from you 
> of course ;-)
>
> To be honest I'd like it to be in Sun Cluster documentation somewhere in bold 
> and large font  because of two reasons. 
>
> First reason is, for example, in company I'm working for we have storage team 
> and I'm not really allowed to touch SAN and know not much about it and I know 
> quite a few large companies and Govt depts which work same way. Anyway, 
> couple of certified Sun Cluster admins we asked about this problem knew not 
> much about it and never seen that behaviour before, person who designed and 
> built cluster had no idea pairresync must be done after each failover and if 
> it was in SC manual or SC manual had at least references to TC manual for 
> this particular case that would greatly help to troubleshoot this 
> issue/feature. 
> Actually I just did +pairresyns +swaps search on SC3.2 Sun's documentation 
> web site and surprisingly all 21 returned results are related to Geographic 
> edition, which is not quite as same as usual SunCluster and basically there's 
> not much information related to troubleshooting and resolution of problems 
> with SC non-geo + TC in SC manual.
>
> Second reason is much more important as for me and that's possible 
> consequences of not doing pairresync after failover. The first time we 
> encountered this problem we actually did not check pair status, booted 
> original node and flipped resource groups back without doing pairresync and 
> as result completely locked TrueCopy without any hope to do pairresync, 
> either -swaps or -swapp. Storage admins had to split pair in their own SAN 
> interface and only then we were able to start our cluster. This accidentally 
> happened four weeks before going production, took whole day to resolve 
> because noone knew what happened and if it was production it would cost us 
> few millions of dollars for every hour of downtime. One page in SC manual 
> stating that admins must always do pairresync after failover with at least 
> reference to TC manual could easily save us that money and at least five 
> years of miserable admin's life in situation of real disaster.
>
> There's other very much related question I would like to find answer for. 
> Right now we have Failback set as False for all our RGs. If you could shed 
> some light  what's going to happen if Failback is set to True, i.e. first 
> node rebooted and RGs started going back before admins had chance to do 
> pairresync I'd greatly appreciate that.
>
> Once more thank you for your help, 
> Sergei
>

[ha-clusters-discuss] HORCM question

Reply via email to