Hi All

I have a 2 node RHEL-5.1 cluster. A quorum disk is configured.
The hosts have 4 NICs. These are bonded:
(eth0+eth2) -> bond0
(eth1+eth3) -> bond1
Unfortunately I was not able to use a dedicated interface for cluster 
communications - bond1 is being used. This is where I think Im in trouble.

The cluster has been configured using IP addressess. I did have to use 
http://archives.free.net.ph/message/20080130.074958.5c7a211c.en.html
as the hostname is related to the bond0 IP.

I have not defined the interface to be used by the cluster, just relying on the 
IP address configured.
The cluster's purpose is 2 GFS file systems.

The cluster was configured and working for 4 days before there was problems.

I now have almost constant lost of token message in /var/log/message. They are 
almost exactly 5 minutes apart. A typical bit of messages file is show below my 
sig.

Just before the problem started a samba message shows nmdb becomming local 
master browser for a work group on the interface used for cluster 
communications.

Jun 20 13:39:27 HOST1 nmbd[24506]: [2008/06/20 13:39:27, 0] 
nmbd/nmbd_become_lmb.c:become_loca
l_master_stage2(396)
Jun 20 13:39:27 HOST1 nmbd[24506]:   *****
Jun 20 13:39:27 HOST1 nmbd[24506]:
Jun 20 13:39:27 HOST1 nmbd[24506]:   Samba name server NBM1 is now a local 
master browser for
workgroup SMS_DOMAIN on subnet 162.16.96.229
Jun 20 13:39:27 HOST1 nmbd[24506]:
Jun 20 13:39:27 HOST1 nmbd[24506]:   *****
Jun 20 13:43:27 HOST1 openais[15265]: [TOTEM] The token was lost in the 
OPERATIONAL state.

"cman_tool status" shows both nodes and looks normal. Looks like clmvd is not 
happy, df commands are hanging.

Could nmdb be causing this token loss? Any ideas on how to proceed?

(names and IPs have been changed).

Thanks

Bevan Broun
Solutions Architect
Ardec International
http://www.ardec.com.au
http://www.lisasoft.com
http://www.terrapages.com
Sydney
-----------------------
Suite 112,The Lower Deck
19-21 Jones Bay Wharf
Pirrama Road, Pyrmont 2009
Ph:  +61 2 8570 5000
Fax: +61 2 8570 5099



Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] The token was lost in the 
OPERATIONAL state.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Receive multicast socket recv 
buffer size (28800
 0 bytes).
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Transmit multicast socket send 
buffer size (2621
 42 bytes).
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering GATHER state from 2.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Creating commit token because I 
am the rep.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Saving state aru 16 high seq 
received 16
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 
20ce34
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT 
state.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Creating commit token because I 
am the rep.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 
20ce38
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT 
state.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Creating commit token because I 
am the rep.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 
20ce3c
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT 
state.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Creating commit token because I 
am the rep.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 
20ce40
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering RECOVERY state.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [0] member 
162.16.96.229:
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 
162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 
received flag 1
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [1] member 
162.16.96.230:
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 
162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 
received flag 1
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Did not need to originate any 
messages in recove
 ry.
Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Sending initial ORF token
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] CLM CONFIGURATION CHANGE
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] New Configuration:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.229)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.230)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Left:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Joined:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] CLM CONFIGURATION CHANGE
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] New Configuration:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.229)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.230)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Left:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Joined:
 Jun 20 13:49:06 HOST1 openais[15265]: [SYNC ] This node is within the primary 
component and wi
 ll provide service.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering OPERATIONAL state.
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] got nodejoin message 
162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] got nodejoin message 
162.16.96.230
 Jun 20 13:49:06 HOST1 openais[15265]: [CPG  ] got joinlist message from node 2
 Jun 20 13:49:06 HOST1 openais[15265]: [CPG  ] got joinlist message from node 1
 Jun 20 13:53:38 HOST1 openais[15265]: [TOTEM] The token was lost in the 
OPERATIONAL state.

The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this email 
is free of viruses or other defects. If you have received this communication in 
error, you may not copy or distribute any part of it or otherwise disclose its 
contents to anyone. Please advise the sender of your incorrect receipt of this 
correspondence.

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to