Linux-ha-cvs Digest, Vol 28, Issue 26

linux-ha-cvs-request Mon, 13 Mar 2006 01:40:33 -0800

Send Linux-ha-cvs mailing list submissions to
        [email protected]


To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.community.tummy.com/mailman/listinfo/linux-ha-cvs
or, via email, send a message with subject or body 'help' to
        [EMAIL PROTECTED]

You can reach the person managing the list at
        [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-ha-cvs digest..."


Today's Topics:

   1. Linux-HA CVS: lib by sunjd from  ([email protected])
   2. Linux-HA CVS: include by sunjd from 
      ([email protected])
   3. Linux-HA CVS: cts by panjiam from 
      ([email protected])
   4. Linux-HA CVS: cts by panjiam from 
      ([email protected])


----------------------------------------------------------------------

Message: 1
Date: Mon, 13 Mar 2006 01:54:14 -0700 (MST)
From: [email protected]
Subject: [Linux-ha-cvs] Linux-HA CVS: lib by sunjd from 
To: [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>

linux-ha CVS committal

Author  : sunjd
Host    : 
Project : linux-ha
Module  : lib

Dir     : linux-ha/lib/fencing


Modified Files:
        stonithd_msg.c 


Log Message:
For bug 1036 and 951
1) Fixed several places which cause memory leak.
2) Rewrite some piece to make memory using safer.
3) Add several allocator for complex data structure
4) Some adjustment on function/variable to make code more clear.
5) Log tweak.

The log of commiting of stonithd.c should be same as this, on which
there is a misoperation of mine. Sorry.


===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/lib/fencing/stonithd_msg.c,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -3 -r1.5 -r1.6
--- stonithd_msg.c      17 Oct 2005 19:13:48 -0000      1.5
+++ stonithd_msg.c      13 Mar 2006 08:54:13 -0000      1.6
@@ -54,6 +54,7 @@
                return HA_FAIL;
        }
        
+       ZAPMSG(msg_tmp);
        return HA_OK;
 }
 




------------------------------

Message: 2
Date: Mon, 13 Mar 2006 01:55:53 -0700 (MST)
From: [email protected]
Subject: [Linux-ha-cvs] Linux-HA CVS: include by sunjd from 
To: [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>

linux-ha CVS committal

Author  : sunjd
Host    : 
Project : linux-ha
Module  : include

Dir     : linux-ha/include/fencing


Modified Files:
        stonithd_api.h 


Log Message:
Changs as a part of fixing of bug 951 and 1036

===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/include/fencing/stonithd_api.h,v
retrieving revision 1.7
retrieving revision 1.8
diff -u -3 -r1.7 -r1.8
--- stonithd_api.h      24 Oct 2005 14:44:28 -0000      1.7
+++ stonithd_api.h      13 Mar 2006 08:55:52 -0000      1.8
@@ -61,13 +61,13 @@
        int             call_id;
        stonith_ret_t   op_result;      
 /*
- * By now op==QUERY node_list is only a char * type. 
+ * By now node_list is only a char * type. 
  * When op==QUERY, it contains the names of the nodes who can stonith the node 
- * whose name is node_name. 
+ * whose name is node_name. Blank is the delimit.
  * When op!=QUERY, it contains the name of the nodes who succeeded in 
stonithing
- * the node whose name is node_name. 
+ * the node whose name is node_name. Blank is the delimit.
  */
-       void *          node_list;
+       void *  node_list;
 
 /*
  * Will pass the value to stonith_ops_callback.        
@@ -144,7 +144,7 @@
        int             op_result;      /* exit code as the real OCF RA */
 
 /* Internally use only */
-       void *          private_data;
+       void *          stonith_obj;
 } stonithRA_ops_t;
 
 /* It's an asynchronus api */




------------------------------

Message: 3
Date: Mon, 13 Mar 2006 02:36:24 -0700 (MST)
From: [email protected]
Subject: [Linux-ha-cvs] Linux-HA CVS: cts by panjiam from 
To: [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>

linux-ha CVS committal

Author  : panjiam
Host    : 
Project : linux-ha
Module  : cts

Dir     : linux-ha/cts


Modified Files:
        CTSlab.py.in 


Log Message:
Added option seed for user specified random seed, bug # 1125
===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/cts/CTSlab.py.in,v
retrieving revision 1.51
retrieving revision 1.52
diff -u -3 -r1.51 -r1.52
--- CTSlab.py.in        9 Mar 2006 01:30:45 -0000       1.51
+++ CTSlab.py.in        13 Mar 2006 09:36:24 -0000      1.52
@@ -339,7 +339,7 @@
 
     def SupplyDefaults(self): 
         if not self.has_key("logger"):
-            self["logger"] = (SysLog(self), StdErrLog(self))
+            self["logger"] = (FileLog(self), StdErrLog(self))
         if not self.has_key("reset"):
             self["reset"] = Stonith()
         if not self.has_key("CMclass"):




------------------------------

Message: 4
Date: Mon, 13 Mar 2006 02:37:50 -0700 (MST)
From: [email protected]
Subject: [Linux-ha-cvs] Linux-HA CVS: cts by panjiam from 
To: [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>

linux-ha CVS committal

Author  : panjiam
Host    : 
Project : linux-ha
Module  : cts

Dir     : linux-ha/cts


Modified Files:
        CIB.py.in CM_LinuxHAv2.py.in CTS.py.in CTStests.py.in 


Log Message:
Make sure machine is up before ssh over it, bug #1109 & #1124
===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/cts/CIB.py.in,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -3 -r1.2 -r1.3
--- CIB.py.in   13 Mar 2006 02:14:55 -0000      1.2
+++ CIB.py.in   13 Mar 2006 09:37:50 -0000      1.3
@@ -8,7 +8,7 @@
 '''
 
 from UserDict import UserDict
-import sys, time, types, syslog, os, struct, string, signal, traceback
+import sys, time, types, syslog, whrandom, os, struct, string, signal, 
traceback
 from CTS  import ClusterManager
 from CM_hb import HeartbeatCM
 
===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/cts/CM_LinuxHAv2.py.in,v
retrieving revision 1.138
retrieving revision 1.139
diff -u -3 -r1.138 -r1.139
--- CM_LinuxHAv2.py.in  8 Mar 2006 22:29:22 -0000       1.138
+++ CM_LinuxHAv2.py.in  13 Mar 2006 09:37:50 -0000      1.139
@@ -167,6 +167,11 @@
     def test_node_CM(self, node):
         '''Report the status of the cluster manager on a given node'''
 
+        if not self.ns.WaitForNodeToComeUp(node):
+             self.log("WaitForNodeToComeUp failed, node %s set to down" % node)
+             self.ShouldBeStatus[node] == self["down"]
+             return 3
+
         watchpats = [ ]
         watchpats.append("Current ping state: (S_IDLE|S_NOT_DC)")
         watchpats.append(self["Pat:They_started"]%node)
@@ -237,6 +242,10 @@
         idle_watch = CTS.LogWatcher(self["LogFileName"], watchpats, timeout)
         idle_watch.setwatch()
 
+        if not self.ns.WaitForAllNodesToComeUp(self.Env["nodes"]):
+            self.log("WaitForAllNodesToComeUp failed.")
+            return None
+
         any_up = 0
         for node in self.Env["nodes"]:
             # have each node dump its current state
@@ -252,6 +261,11 @@
 
     def is_node_dc(self, node, status_line=None):
         rc = 0
+
+        if not self.ns.WaitForNodeToComeUp(node):
+           self.log("WaitForNodeToComeUp %s failed." % node) 
+           return rc
+
         if not status_line: 
             status_line = self.rsh.readaline(node, self["StatusCmd"]%node)
 
@@ -275,6 +289,10 @@
 
     def isolate_node(self, node, allowlist):
         '''isolate the communication between the nodes'''
+        if not self.ns.WaitForNodeToComeUp(node):
+           self.log("WaitForNodeToComeUp %s failed." % node)
+           return None
+ 
         rc = self.rsh(node, self["BreakCommCmd2"]%allowlist)
         if rc == 0:
             return 1
@@ -389,6 +407,10 @@
     def find_partitions(self):
         ccm_partitions = []
 
+        if not self.ns.WaitForAllNodesToComeUp(self.Env["nodes"]):
+            self.log("WaitForAllNodesToComeUp failed.")
+            return None
+
         for node in self.Env["nodes"]:
             self.debug("Retrieving partition details for %s" %node)
             if self.ShouldBeStatus[node] == self["up"]:
@@ -419,6 +441,10 @@
         if not node_list:
             node_list = self.Env["nodes"]
 
+        if not self.ns.WaitForAllNodesToComeUp(node_list):
+            self.log("WaitForAllNodesToComeUp failed in HasQuorum.")
+            return 0
+
         for node in node_list:
             if self.ShouldBeStatus[node] == self["up"]:
                 quorum = self.rsh.readaline(node, self["QuorumCmd"])
@@ -438,6 +464,10 @@
         return complist
     
     def NodeUUID(self, node):
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s failed in NodeUUID." % node)
+            return ""
+
         lines = self.rsh.readlines(node, self["UUIDQueryCmd"])
         for line in lines:
             self.debug("UUIDLine:"+ line) 
@@ -447,6 +477,10 @@
         return ""       
                             
     def StandbyStatus(self, node):
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s failed in StandbyStatus." % node)
+            return ""
+
         out=self.rsh.readaline(node, self["StandbyQueryCmd"]%node)
         if not out:
             return "off"
@@ -457,6 +491,10 @@
     # status == "on" : Enter Standby mode
     # status == "off": Enter Active mode
     def SetStandbyMode(self, node, status):
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s failed in SetStandbyMode." % node)
+            return True
+
         current_status = self.StandbyStatus(node)
        cmd = self["StandbyCmd"] % (node, status)
         ret = self.rsh(node, cmd)
@@ -512,7 +550,10 @@
         on the given node in the cluster.
         We call the status operation for the resource script.
         '''
-        
+        if not self.CM.ns.WaitForNodeToComeUp(nodename):
+            self.CM.log("WaitForNodeToComeUp %s failed in IsRunningOn." % node)
+            return 0
+ 
         out=self.CM.rsh.readaline(nodename, self.CM["IsRscRunning"]%self.rid)
         return re.search("0",out)
         
@@ -528,6 +569,10 @@
         '''
         Execute an operation on the resource
         '''
+        if not self.CM.ns.WaitForNodeToComeUp(nodename):
+            self.CM.log("WaitForNodeToComeUp %s failed in _ResourceOperation." 
% node)
+            return 0
+
         self.CM.rsh.readaline(nodename, 
self.CM["ExecuteRscOp"]%(self.rid,operation))
         return self.CM.rsh.lastrc == 0
 
===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/cts/CTS.py.in,v
retrieving revision 1.51
retrieving revision 1.52
diff -u -3 -r1.51 -r1.52
--- CTS.py.in   23 Feb 2006 23:01:08 -0000      1.51
+++ CTS.py.in   13 Mar 2006 09:37:50 -0000      1.52
@@ -301,7 +301,42 @@
 #+                if not matchobj:
 #+                    self.regexes.append(regex)
 
-            
+class NodeStatus:
+    def __init__(self, Env):
+        self.Env = Env
+    def IsNodeBooted(self, node):
+        '''Return TRUE if the given node is booted (responds to pings'''
+        return os.system("@PING@ -nq -c1 @PING_TIMEOUT_OPT@ %s >/dev/null 
2>&1" % node) == 0
+
+    def WaitForNodeToComeUp(self, node, Timeout=300):
+        '''Return TRUE when given node comes up, or None/FALSE if timeout'''
+        timeout=Timeout
+        anytimeouts=0
+        while timeout > 0:
+            if self.IsNodeBooted(node):
+                if anytimeouts:
+                     # Fudge to wait for the system to finish coming up
+                     time.sleep(30)
+                     self.Env.log("Node %s now up" % node)
+                return 1
+
+            time.sleep(1)
+            if (not anytimeouts):
+                self.Env.log("Waiting for node %s to come up" % node)
+                
+            anytimeouts=1
+            timeout = timeout - 1
+
+        self.Env.log("%s did not come up within %d tries" % (node, Timeout))
+
+    def WaitForAllNodesToComeUp(self, nodes, timeout=300):
+        '''Return TRUE when all nodes come up, or FALSE if timeout'''
+
+        for node in nodes:
+            if not self.WaitForNodeToComeUp(node, timeout):
+                return None
+        return 1
+
 class ClusterManager(UserDict):
     '''The Cluster Manager class.
     This is an subclass of the Python dictionary class.
@@ -368,6 +403,7 @@
         self.ShouldBeStatus={}
         self.OurNode=string.lower(os.uname()[1])
         self.ShouldBeStatus={}
+        self.ns = NodeStatus(self.Env)
 
     def errorstoignore(self):
         '''Return list of errors which are 'normal' and should be ignored'''
@@ -415,6 +451,10 @@
         if self.ShouldBeStatus[node] != self["down"]:
            return 1
 
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed" % node)
+            return 0
+
         patterns = []
         # Technically we should always be able to notice ourselves starting
        if self.upcount() == 0:
@@ -461,6 +501,11 @@
         '''Start up the cluster manager on a given node with none-block mode'''
 
         self.debug("Starting %s on node %s" %(self["Name"], node))
+
+        if not self.ns.IsNodeBooted(node):
+            self.log("node %s not booted, StartaCMnoBlock failed." % node)
+            return 0
+
         self.rsh.noBlock(node, self["StartCmd"])
         self.ShouldBeStatus[node]=self["up"]
         return 1
@@ -474,6 +519,10 @@
         if self.ShouldBeStatus[node] != self["up"]:
            return 1
 
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed" % node)
+            return 0
+
         if self.rsh(node, self["StopCmd"]) == 0:
             self.ShouldBeStatus[node]=self["down"]
            self.cluster_stable(self["DeadTime"])
@@ -488,6 +537,11 @@
         '''Stop the cluster manager on a given node with none-block mode'''
 
         self.debug("Stopping %s on node %s" %(self["Name"], node))
+
+        if not self.ns.IsNodeBooted(node):
+            self.log("node %s not booted, StopaCNnoBlock failed." % node)
+            return 0
+
         self.rsh.noBlock(node, self["StopCmd"])
         self.ShouldBeStatus[node]=self["down"]
         return 1
@@ -504,6 +558,9 @@
         '''Force the cluster manager on a given node to reread its config
            This may be a no-op on certain cluster managers.
         '''
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed" % node)
+            return 0
 
         rc=self.rsh(node, self["RereadCmd"])
         if rc == 0:
@@ -599,7 +656,10 @@
     def isolate_node(self, node):
 
         '''isolate the communication between the nodes'''
-    
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed in isolate_node" % node) 
+            return None
+
         rc = self.rsh(node, self["BreakCommCmd"])
         if rc == 0:
             return 1
@@ -610,6 +670,10 @@
     def unisolate_node(self, node):
 
         '''fix the communication between the nodes'''
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed in unisolate_node" % node)
+            return None
+
         rc = self.rsh(node, self["FixCommCmd"])
         if rc == 0:
             return 1
@@ -619,6 +683,10 @@
         
     def reducecomm_node(self,node):
         '''reduce the communication between the nodes'''
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed in reducecomm_node" % node)
+            return None
+
         rc = self.rsh(node, 
self["ReduceCommCmd"]%(self.Env["XmitLoss"],self.Env["RecvLoss"]))
         if rc == 0:
             return 1
@@ -629,6 +697,11 @@
     def savecomm_node(self,node):
         '''save current the communication between the nodes'''
         rc = 0
+
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed in savecomm_node" % node)
+            return None
+
         if float(self.Env["XmitLoss"])!=0 or float(self.Env["RecvLoss"])!=0 :
             rc = self.rsh(node, self["SaveFileCmd"]);
         if rc == 0:
@@ -640,6 +713,10 @@
     def restorecomm_node(self,node):
         '''restore the saved communication between the nodes'''
         rc = 0
+        if not self.ns.WaitForNodeToComeUp(node):
+            self.log("WaitForNodeToComeUp %s filed in restoreomm_node" % node)
+            return None
+
         if float(self.Env["XmitLoss"])!=0 or float(self.Env["RecvLoss"])!=0 :
             rc = self.rsh(node, self["RestoreCommCmd"]);
         if rc == 0:
@@ -804,6 +881,10 @@
         self.KillCmd = "killall -9 " + self.name
         
     def kill(self, node):
+        if not self.CM.ns.WaitForNodeToComeUp(node):
+            self.CM.log("WaitForNodeToComeUp %s filed in kill" % node)
+            return None
+
         if self.CM.rsh(node, self.KillCmd) != 0:
             self.log ("Warn: Kill %s failed on node %s" %(name,node))
             return None
@@ -892,44 +973,6 @@
             j=j-1
 
 
-class NodeStatus:
-    def __init__(self, Env):
-        self.Env = Env
-    def IsNodeBooted(self, node):
-        '''Return TRUE if the given node is booted (responds to pings'''
-        return os.system("@PING@ -nq -c1 @PING_TIMEOUT_OPT@ %s >/dev/null 
2>&1" % node) == 0
-
-    def WaitForNodeToComeUp(self, node, Timeout=300):
-        '''Return TRUE when given node comes up, or None/FALSE if timeout'''
-        timeout=Timeout
-        anytimeouts=0
-        while timeout > 0:
-            if self.IsNodeBooted(node):
-                if anytimeouts:
-                     # Fudge to wait for the system to finish coming up
-                     time.sleep(30)
-                     self.Env.log("Node %s now up" % node)
-                return 1
-
-            time.sleep(1)
-            if (not anytimeouts):
-                self.Env.log("Waiting for node %s to come up" % node)
-                
-            anytimeouts=1
-            timeout = timeout - 1
-
-        self.Env.log("%s did not come up within %d tries" % (node, Timeout))
-
-    def WaitForAllNodesToComeUp(self, nodes, timeout=300):
-        '''Return TRUE when all nodes come up, or FALSE if timeout'''
-
-        for node in nodes:
-            if not self.WaitForNodeToComeUp(node, timeout):
-                return None
-        return 1
-
-
-
 class InitClusterManager(ScenarioComponent):
     (
 '''InitClusterManager is the most basic of ScenarioComponents.
@@ -938,7 +981,7 @@
 as they might have been rebooted or crashed for some reason beforehand.
 ''')
     def __init__(self, Env):
-       self.ns=NodeStatus(Env)
+        pass
 
     def IsApplicable(self):
         '''InitClusterManager is so generic it is always Applicable'''
@@ -947,8 +990,6 @@
     def SetUp(self, CM):
         '''Basic Cluster Manager startup.  Start everything'''
 
-        if not self.ns.WaitForAllNodesToComeUp(CM.Env["nodes"]):
-            return None
         CM.prepare()
 
         #        Clear out the cobwebs ;-)
@@ -969,8 +1010,6 @@
     def TearDown(self, CM):
         '''Set up the given ScenarioComponent'''
 
-        self.ns.WaitForAllNodesToComeUp(CM.Env["nodes"])
-
         # Stop the cluster manager everywhere
 
         CM.log("Stopping Cluster Manager on all nodes")
===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/cts/CTStests.py.in,v
retrieving revision 1.138
retrieving revision 1.139
diff -u -3 -r1.138 -r1.139
--- CTStests.py.in      23 Feb 2006 23:01:08 -0000      1.138
+++ CTStests.py.in      13 Mar 2006 09:37:50 -0000      1.139
@@ -161,7 +161,7 @@
         if not self.Scenario.SetUp(self.CM):
             return None
 
-       self.ns.WaitForAllNodesToComeUp(self.CM.Env["nodes"])
+       self.CM.ns.WaitForAllNodesToComeUp(self.CM.Env["nodes"])
         testcount=1
         time.sleep(30)
 
@@ -190,7 +190,6 @@
             else:
                 self.incr("failure")
                 self.CM.log("Test %s (%s) \t[FAILED]" %(test.name,nodechoice))
-               self.ns.WaitForAllNodesToComeUp(self.CM.Env["nodes"])
                 # Better get the current info from the cluster...
                 self.CM.statall()
                # Make sure logging is working and we have enough disk space...
@@ -734,7 +733,6 @@
         failed=[]
         self.starttime=time.time()
         for node in self.CM.Env["nodes"]:
-           self.ns.WaitForAllNodesToComeUp(self.CM.Env["nodes"])
             if not self.start(node):
                 failed.append(node)
 
@@ -2025,7 +2023,7 @@
         self.incr("calls")
         startset = []
         stopset = []
-        
+       
         #decide what to do with each node
         for node in self.CM.Env["nodes"]:
             action = self.CM.Env.RandomGen.choice(["start","stop"])
@@ -2038,6 +2036,10 @@
         self.CM.debug("start nodes:" + repr(startset))
         self.CM.debug("stop nodes:" + repr(stopset))
 
+        if not self.CM.ns.WaitForAllNodesToComeUp(self.CM.Env["nodes"]):
+            self.log("WaitForAllNodesToComeUp filed")
+            return self.skipped()
+ 
         #add search patterns
         watchpats = [ ]
         for node in stopset:
@@ -2214,6 +2216,11 @@
     def __call__(self, dummy):
         '''Perform the 'SimulStopLite' setup work. '''
         self.incr("calls")
+
+        if not self.CM.ns.WaitForAllNodesToComeUp(self.CM.Env["nodes"]):
+            self.log("WaitForAllNodesToComeUp filed")
+            return self.skipped()
+
         self.CM.debug("Setup: " + self.name)
 
         #     We ignore the "node" parameter...
@@ -2271,6 +2278,11 @@
         '''Perform the 'SimulStartList' setup work. '''
         self.incr("calls")
         self.CM.debug("Setup: " + self.name)
+
+        if not self.CM.ns.WaitForAllNodesToComeUp(self.CM.Env["nodes"]):
+            self.log("WaitForAllNodesToComeUp filed")
+            return self.skipped()
+
         #        We ignore the "node" parameter...
         watchpats = [ ]
 




------------------------------

_______________________________________________
Linux-ha-cvs mailing list
[email protected]
http://lists.community.tummy.com/mailman/listinfo/linux-ha-cvs


End of Linux-ha-cvs Digest, Vol 28, Issue 26
********************************************

Linux-ha-cvs Digest, Vol 28, Issue 26

Reply via email to