Dejan Muhamedagic wrote:
Hi,

On Tue, Feb 19, 2008 at 03:38:46AM -0600, Michael Brennen wrote:
On Sun, 17 Feb 2008, Michael Brennen wrote:

Heartbeat 2.1.3, crm enabled

I've built an initial drbd master/slave on two systems, lvc7 and lvc8, following http://www.linux-ha.org/DRBD/HowTov2. The drbd is coming alive in P/S mode, but it will not fail over when I kill the master; the slave stays in secondary. stonith was not working, so I've decided to make that work in v2 (I had it working in a v1 build about a month ago.)
.....
Then, I am trying to define the stonith device in xml; the source to the apc3.xml file is attached. I am adding from a separate command line:
Just to clear this up, I found my own silly problem: I had misspelled one of the parameters (password, not passwd) for the apcmaster stonith device in the xml.

Does that mean that the patch you posted is not necessary?
There have been several reports of this issue over the last few years where the plugin is looking for the 'Escape char' string before the user name but it isn't there. I'm pretty sure it depends on what flavor of telnet is being used, but I still believe the correct patch is to look for both possibilities instead of just one or the other - something like the following:

static struct Etoken EscapeChar[] = { {"Escape character is '^]'.", 0, 0}
                                      ,       {"User Name :", 1, 0}
                                      ,       {NULL,0,0}};

: :

static int
MSLogin(struct pluginDevice * ms)
{
     int rc;

      /*
       * Apparently some telnet apps display the escape character while
       * others don't, so we need to handle both possibilities...
       *
       * rc == 0 : "Escape character is '^]'." found
       * rc == 1 : "User Name :" found
       * rc <  0 : Neither found or timeout
       */
      if ((rc = StonithLookFor(ms->rdfd, EscapeChar, 10)) < 0) {
              return(errno == ETIMEDOUT ? S_TIMEOUT : S_OOPS);
      } else if (rc == 0) {
              /*
               * We should be looking at something like this:
               *      User Name :
               */
              EXPECT(ms->rdfd, login, 10);
      }
      SEND(ms->wrfd, ms->user);

I sent out a patch similar to this - or maybe exactly like this :-) - to a couple folks to ask for verification that it worked, never got any feedback so it never made it in to production.
The stonith daemons start successfully now, but with a monitor interval of 15s one of the two fails fairly quickly. The apc (9211 masterswitch) only allows a single login, and I wonder if the two daemons aren't colliding, and one is timing out and giving up.

Did you take a look at the logs to confirm this?
You should be able to see something in the logs to this effect, did you add "debug 1" to your ha.cf files and look? Based on timestamps you should be able to see if one tries to login while the other one IS logged in.
Thanks,

Dejan

Fortunately this is a test cluster; from what I have seen I would never put this pdu in production. I will work with an external/ssh stonith setup to see if I can't avoid the problems with the apc.

   -- Michael
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to