Re: [Linux-HA] Heartbeat 2: failover of EVMS private container resources

Fabian Herschel Sat, 01 Dec 2007 03:27:55 -0800

Hi,

please try either lower-cased host/node names or use the patch I sent
yesterday. The problem is that heartbeat uses the lower-cased hostnames
as nodenames and membership list in the CCM. EVMS compares
case-sensitive. This means evms says your cluster node acquiring the
private container is not allowed to do it, as the CCM has not the exact
node name in its list.


In you case the node names (CZVLabNode2) is not lower cased this is the
cause of the problem. Either change both nodes to lower cased (uname -n
must report correctly, hostname also), or apply the patch.

After that you should use the following procedure to come out of the
stored failuers of the evms_failover resource:

1. Cleanup the resource
2. Stop(!) the resource
3. Start the resource

If the resource belongs to a group finaly delete the target_role of the
evms_failover resource.

Now everything should work fine.

Best regards
Fabian

Am Freitag, den 23.11.2007, 11:16 +0100 schrieb Chris:
> Hi Yan,
>             Thanks a lot for your help. I took out the evmsSCC
> resource from the scenario, but I did not see any difference in the
> system behavior, then I followed your suggestion and I manually tested
> the EVMS commands from the CLI while both the nodes where in stand-by,
> and I actually realized that the command:
> 
> modify: gwcont,type=private,node=CZVLabNode2
> 
> was failing; was somehow not recognized as a valid command.
> 
> The really weird thing is that the same command, avoiding the capital
> letters in the host name, was successful:
> 
> modify: gwcont,type=private,node=czvlabnode2
> 
> This was true in both nodes, so I modified both the hostnames from:
> 
> CZVLabNode1 --> czvlabnode1
> CZVLabNode2 --> czvlabnode2
> 
> and now the fail over is working properly. like everything else.
> 
> The reason why I tried to change the host names so to avoid any
> capital letter is that I noticed that, even if my host names were a
> mixture of normal and capital letters, in the hb_gui they were shown
> without capitals.
> 
> As soon as I will have time for this, I will do some further test to
> verify if I can duplicate this again starting from scratch, so to
> verify if Heartbeat 2.1.2 and/or EVMS 2.5.5.-24.52 really have some
> issues with node names partially capitalized, I will update the list
> afterwards.
> 
> Could also be that I modified something else in the system that I'm
> not fully aware of, or I simply or forgot it, as I did many different
> test on the same boxes.
> 
> Thanks again,
>                       Chris
> 
> 
> 
> 
> On Nov 21, 2007 9:23 PM, Yan Fitterer <[EMAIL PROTECTED]> wrote:
> > Andrew Beekhof wrote:
> > >
> > > On Nov 21, 2007, at 10:11 AM, Christian Zemella wrote:
> > >
> > >> Hi All,
> > >>        Anybody out there managed to have EVMS container resources
> > >> properly failing over in a 2 node Heartbeat 2 cluster running on SLES
> > >> 10 SP1 ?
> > >
> > > I believe so... have you read the documentation below?
> > >    http://wiki.novell.com/images/3/37/Exploring_HASF.pdf
> > >
> > >>
> > >>
> > >> In my lab I can only start and stop the resource on the node that has
> > >> the container assigned within evms, while if I shut down that node,
> > >> the fail over does not occur as the evms_failover resource goes in
> > >> time out; as soon as the other nodes comes up again it takes the
> > >> resource back properly.
> >
> > This would indicate that evms_failover RA cannot assign the container to
> > the new node. Do you see the resource failing? Have you checked
> > failcount for the resources on that node?
> >
> > Some clues (from evms perspective): take a look in /dev/evms/.nodes
> > When the private container is present on the node, a device file named
> > after the container should appear there.
> >
> > TO test manually, the easiest is to start HB, then put both nodes on
> > standby, then manipulate the evms devices manually.
> >
> > To deport the container (on resource stop) evms_failover issues commands
> > to the evms command line tool:
> >
> > modify:"$1",type=deported
> > save
> > exit
> >
> > where $1 is the value of the "1" parameter you've passed to evms_failover.
> >
> > You can try this yourself manually, to verify where the issue is (i.e.
> > with evms or elsewhere).
> >
> > To import the container (when starting the resource), evms_failover does:
> >
> > modify:"$1",node="$HOSTNAME",type=private
> > save
> > exit
> >
> >
> >
> > >>
> > >> In my environment I created the following:
> > >>
> > >> I'm working using 2 VMWare boxes sharing one 4GB plain disk that works
> > >> as SAN;
> > >>
> > >> EVMS:
> > >>
> > >> I created a private container (gwcont) on the shared disk using CSM
> > >> plug-in and in it an EVMS Volume (gwvol);
> > >> on the volume i make a reiserfs file system;
> > >> I verified that the HA plug-in was working and that the node assigned
> > >> to the container can manually mount it.
> > >>
> > >> HB_GUI:
> > >>
> > >> I created a group ordered and collocated;
> > >> Inside the group i created the following resources:
> > >> - evmsSCC --> no No attributes, No Parameters;
> > >> - evms_failover --> Parameter: 1 Value: gwcont (name of the EVMS
> > >> container )
> > >> - Filesystem --> Parameter: fstype Value: reiserfs; Parameter: device
> > >> Value: /dev/evms/gwcont/gwvol; Parameter: directory Value: /gw;
> > >> - IPAddr --> Parameter ip Value: xxx.xxx.xxx.xxx
> > >>
> > >> I then created a Location constraint so to assign the value 100 for
> > >> the group to run on Node 1, and a second Location constraint so to
> > >> assign the value 50 for the group to run on Node2.
> > >>
> > >> ha.cf:
> > >>
> > >> ***
> > >> autojoin any
> > >> crm true
> > >> ucast eth1 xxx.xxx.xxx.xxx (ipaddress of eth1 on the other node)
> > >> auto_failback off
> > >> node CZVLabNode1
> > >> node CZVLabNode2
> > >> respawn hacluster /usr/lib/heartbeat/ccm
> > >> respawn root /sbin/evmsd
> > >> apiauth evms uid=hacluster,root
> > >> apiauth ccm uid=hacluster, root
> > >> apiauth crm uid=hacluster,root
> > >> ***
> > >>
> > >> On both nodes I issued:
> > >>
> > >> chkconfig boot.evms on
> > >>
> > >> My feeling is that I'm doing something wrong in the configuration,
> > >> anybody can point me to the error I'm eventually doing here ?
> >
> > There's certainly no use for the evmsSCC resource (SCC stands for....
> > Shared Cluster Container). For private containers, you need to use
> > evms_failover exclusively.
> >
> > You may  need to start evms as well (/etc/init.d/evms), not just boot.evms.
> >
> > Finally, yes it _does_ work, but it's not flawless, in my experience.
> >
> > HTH
> > Yan
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
-- 
SUSE LINUX GmbH,
Maxfeldstr. 5, D - 90409 Nürnberg
Phone:  +49 (0)69  - 2174-1923
FaxFFM: +49 (0)69  - 2174-1740
FaxDUS: +49 (0)211 - 5631-3769
e-mail: [EMAIL PROTECTED]

-------------------------------------

SUSE LINUX GmbH, GF: Volker Smid, HRB 21284 (AG Nürnberg)

-------------------------------------

PLEASE NOTE:  This e-mail may contain confidential and privileged
material for the sole use of the intended recipient.  Any review,
distribution or other use by anyone else is strictly prohibited.  If you
are not an intended recipient, please contact the sender and delete all
copies.  Thank you.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat 2: failover of EVMS private container resources

Reply via email to