"ADSM: Dist Stor Manager" <[email protected]> wrote on 12/22/2005
11:53:11 AM:

> In a MSCS cluster, an admin of one of our higher profile client machines
> failed over from one machine (OLALPHA) back to other (OLBRAVO) after
> BRAVO crashed this morning.
>
>
>
> Since I've been having a devil of a time with a MSCS cluster resource
> that serves as the scheduler for the cluster drive on BRAVO not coming
> up. To begin with, it posted ANS1835E, ANS1025E, ANS1570E, all of which
> point to authentication problems. I updated the node password, issued a
> 'q ses -optfile...', and it would authenticate fine. When I try to bring
> the cluster resource back online, it stays up from a few seconds, fails,
> and when I check the registry, the passwords has disappeared! What in
> the world? It has also posted ANS1029E and ANS2050E since I've been
> playing around trying to get the cluster resource to work, and also the
> base client (to back up C/D/system state) has been issuing ANS1977E with
> the "ccCreateTimerFile: Unable to create timer file" and "errno=13
> error: Permission denied".
>

It sounds like the services weren't setup properly from the start or the
service password somehow got out of sync.  When setting up the services in
the cluster, it is very important to fully set them up on each node of the
cluster and be sure they are working BEFORE setting up the service in the
cluster manager.  I think your only solution is to remove the service from
the cluster configuration, then remove/resetup the services on one node,
restart the service several times and make sure it works OK.  Then failover
to the other node and repeat.  Once you are sure both work, add the service
back in to the cluster, make sure you get the right registry key setup to
replicate during failover.  Fail back and forth a couple times to make sure
all is working properly.

The big drawback here is that you will need to do this during downtime when
you can failover nodes quite a few times.  That is why it is so important
to ensure it is done right from the start.

Every time I have seen the disappearing password in a cluster it was
because the services weren't setup right initially or fully before
configuring them in the cluster.  In one rare case, special characters in
the node password also caused a problem and the password wouldn't replicate
properly.  For this reason I always use only letters or numbers in cluster
node passwords (no underscores, dashes, etc.).

Reply via email to