Title: RE: [ActiveDir] OT: Windows 2003 Cluster
All,
The cluster is now operational.  The problem had to do with a series of events that unfolded to make this a complex troubleshooting issue.  The security applied by our higher set the STORE LM HASH value in the security template to enabled.  With this enabled prior to SP1 if you ever change your cluster service account password, the new password must be 15 characters or greater, unless you apply the hotfix.  We changed the cluster service account to a 15 character password, applied LM Hash hotfix and restarted services and rebooted nodes.  Cluster has no problem communicating at this point.  Since our password change of the cluster account 2 weeks ago was the catalyst for the loss in communications between the nodes, it was a very troubled process to troubleshoot, but we are now past that point and  just left with migrating printer drivers that do not exist on node 1 from node 2.  Apparently Windows 2003 is not supposed to need a print migration done in the way of Windows 2000 Advanced server, which called for you to install print drivers on node 1, node 2, and virtual print node.  This is the way it is functional in Windows 2003 as well. 
 
Thanks everyone for your suggestions and ideas it helped tremendously,
 
Nate
GD-NS


From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Tuesday, July 26, 2005 2:04 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

Almost sounds like a cluster is not providing the benefits you were after.
 
Not sure I can be of any help with the next piece.  That is odd, but you might have a look at the TS servers and see if they're logging anything else.  Same with the cluster to see if anything in the security logs.  Might be to do with the hotfix?
 
Al


From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Tue 7/26/2005 1:48 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

Well AL,  so far I have figured out that the cluster account requires a 15 character or greater password without SP1 or the hotfix for it.  So I changed the account password and restarted the services and both nodes are online.  The only problem now is that I only see half the printers on the new node, and our shares are inaccessible from the cluster.  I get a error when trying to log on as a regular user, not a admin, that states :  YOU DO NOT HAVE PERMISSION TO ACCESS YOUR CENTRAL PROFILE LOCATED AT \\SERVERNAME\SHARE$\USERNAME.  CONTACT YOUR NETWORK ADMINISTRATOR.  It is a Userenv Source with an Event ID of 1000.  So now everybody wants to know why they cant get their profiles and I am scrambling for an answer.  Its not permissions, or share permissions, I have opened them wide open and I cant understand it because it only happens to regular users and only users of the Terminal Server enviroment.
 
Today is a crazy day!!!!
 
Nate 


From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 2:40 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

I'm interested to hear how it works out. 
 
When I mentioned the HBA, I was thinking more along the lines of ensuring that there are no issues with the physical hba.  When an HBA goes, symptoms are often strange and not expected.  Same for the ports and switches between the hba and the SAN.
 
Al 


From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 1:10 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

Yes,  I pulled up the config gui and read the config and compared the functioning node's config with the failing nodes config and they are identical.  The HBA sees all assigned LUNS as well.  I dont think it is a storage issue.  I have been on the phone with Microsoft and they said it may be a security issue and for me to reset the cluster account passwords and recycle the services on both nodes,  however I cannot do that until there is downtime allowable so probably will have to try that tonight or something.  I dont understand their idea of it being a password issue though, because they had me log in as the cluster service account, but they said the DC's may have a different password in AD than the cluster nodes have in SCM.  They said it doesnt make sense either but for me to try it.
 
Nate


From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 12:08 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

Have you also verified that the HBA is functioning correctly?


From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 11:21 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

It had WMI access denied errors that entailed ripping apart the repository of the WMI database and since WMI was not starting the cluster could not read the WMI information and did not see the other node properly.  I used the resetquorum switch which failed with a 1067 could not start service error at the command line.  Our Microsoft Premier support call entailed doing everything I already did, and then they started researching (Google), so I told them I would keep troubleshooting, and for them to call me back when they think of something as well.  I have confirmed that the WWN on the SAN is the WWN on the HBA that is in the failing node, and the configuration is in tact for that node. 
 
Nathaniel


From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 11:01 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

It's this that gives me the heartache: " The SAN still has the configuration data for the WWN of the node "      
 
In my experience, whenever troubleshooting always assume nothing is correct and troubleshoot accordingly. Those errors indicate that it cannot talk to the disk properly. It's possible that's because the other node owns it, however it is also possible that a configuration change has been made at some point.
 
It pays to be suspicious of the configuration even if you think it has already been done a long time ago. It is not a static configuration and it's worth it to ensure that it is configured properly. After all, the other node failed for a reason right?
 
I also assume that you used the -resetquorum etc switches (syntax) right?
 
That looks suspiciously like a disk access error though.  Something about not being able to read the disk which may also indicate a failure at a different level (HBA for example?)
 
Out of curiousity, what was the failure that the node was exhibiting prior to rebuild?
 
Al
 
 
 


From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 10:36 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

You are correct, this is a SAN configuration with JNI FC HBA's.  The node was configured and running for a long while before it failed.  The SAN still has the configuration data for the WWN of the node, as it was already configured as a node previously.  Same node, same card, same WWN, same system, same name, everything is the same basically.  In the event log the only errors that present themselves is a 1209 error from the system log, source ClusDisk, Description: Cluster service is requesting a bus reset for device \Device\Clusdisk3Part0.  Other than that its not logging any other errors.  The cluster log is logging thee error during starting of the service, PHYSICAL DISK <DISK Q:> [DISKARB] FAILED TO READ (SECTOR 12), ERROR 170.   I checked the Microsoft Site and it talks about when both nodes are coming up at the same time, but this is not the case, as one node is already up with resources online and everything.
 
Nathaniel


From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 10:20 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

Ruled out storage issues?  Can we assume this is a SAN configuration? And I assume that the new node has the appropriate zoning information configured correctly for its WWN?  That would be a change of course, but...
 
What do you see in the event log on that node and for the cluster?


From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 10:03 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

I did evict the node, forcedcleanup, rebuilt a new member server, joined it to the domain, added it as a node to the existing cluster.  This is the result of that.  I dont see any way this could be a naming issue as the name resolution for DNS and WINS is completely functional from that node to other nodes, and vice versa.  Storage config is ruled out because there has not been any change in our storage setup. 


From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 9:57 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

I'm confused.  Why didn't you just evict the failing node and join the new one?  Are you sure you don't have a naming issue or perhaps a storage config issue?  I see nothing about either of those.
 
Al


From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 9:49 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

I did confirm that the cluster service account is a member of the local administrators account on both boxes and that the passwords I entered are correct and the account is not locked out and it has the correct user rights on the local node.  I wish that were the answer!!

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Hunter, Laura E.
Sent: Monday, July 25, 2005 9:42 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

Have you explicitly added the cluster service account to the local Administrators group on the two nodes?  I had a few bizarre niggling cluster issues that were resolved by doing that.  Even though the service account was already a local admin on the box by virtue of group membership, the cluster service didn't seem to be satisfied until I had specifically added the individual user account itself.

HTH

- Laura

> -----Original Message-----
> From: Bahta Nathaniel V Contr NASIC/SCNA
> [mailto:[EMAIL PROTECTED]]
> Sent: Monday, July 25, 2005 6:07 AM
> To: [email protected]
> Subject: RE: [ActiveDir] OT: Windows 2003 Cluster
>
> Hey gang,
>
> I have a 2003 cluster and one of the nodes was rebuilt because it was
> failing.  I cannot get the quorum resource to function correctly on
> the new node.
>
> Here is what I have done:
>
> Rebuilt and patched the failing node.
>
> Blocked all group policy I could and put it in a separate OU. 
>
> Used KB article to ensure cluster service account has appropriate
> permissions on node.
>
> Used KB article to ensure LOCAL SERVICE accounts and SERVICE accounts
> have appropriate permissions on the node.
>
> Disabled LMHASH storage requirement of 14 character cluster service
> account password.
>
> Compared services and security on failing node using Resultant Set of
> Policy wizard and verified that both nodes have the same security in
> place.
>
> Regenerated failing WMI database repository on failing node.
>
> Started cluster service on failing node using /fixquorum switch.
>
> Attempted to start cluster service on failing node using /resetquorum
> switch .... It failed to start cluster service producing an 1067 error
>
> Rebuilt quorum from functioning node by copying ChXXX.tmp file from
> source node to failing node in safe mode and renaming ChXXX.tmp to
> CLUSDB
>
> Ran NTBACKUP.EXE on functioning node and backed up the system state,
> restored Cluster Information using the system state backup and used
> option to restore quorum info to all nodes as well.
>
>
> Does anyone have any ideas on how to make the quorum function on the
> new node?  Any help would be appreciated greatly.
>
> Thanks,
>
> Nathaniel Bahta
> GD-NS
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
>
>
List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

Reply via email to