Have you also verified that the HBA is functioning correctly? ________________________________
From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA Sent: Mon 7/25/2005 11:21 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster It had WMI access denied errors that entailed ripping apart the repository of the WMI database and since WMI was not starting the cluster could not read the WMI information and did not see the other node properly. I used the resetquorum switch which failed with a 1067 could not start service error at the command line. Our Microsoft Premier support call entailed doing everything I already did, and then they started researching (Google), so I told them I would keep troubleshooting, and for them to call me back when they think of something as well. I have confirmed that the WWN on the SAN is the WWN on the HBA that is in the failing node, and the configuration is in tact for that node. Nathaniel ________________________________ From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick Sent: Monday, July 25, 2005 11:01 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster It's this that gives me the heartache: " The SAN still has the configuration data for the WWN of the node " In my experience, whenever troubleshooting always assume nothing is correct and troubleshoot accordingly. Those errors indicate that it cannot talk to the disk properly. It's possible that's because the other node owns it, however it is also possible that a configuration change has been made at some point. It pays to be suspicious of the configuration even if you think it has already been done a long time ago. It is not a static configuration and it's worth it to ensure that it is configured properly. After all, the other node failed for a reason right? I also assume that you used the -resetquorum etc switches (syntax) right? That looks suspiciously like a disk access error though. Something about not being able to read the disk which may also indicate a failure at a different level (HBA for example?) Out of curiousity, what was the failure that the node was exhibiting prior to rebuild? Al ________________________________ From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA Sent: Mon 7/25/2005 10:36 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster You are correct, this is a SAN configuration with JNI FC HBA's. The node was configured and running for a long while before it failed. The SAN still has the configuration data for the WWN of the node, as it was already configured as a node previously. Same node, same card, same WWN, same system, same name, everything is the same basically. In the event log the only errors that present themselves is a 1209 error from the system log, source ClusDisk, Description: Cluster service is requesting a bus reset for device \Device\Clusdisk3Part0. Other than that its not logging any other errors. The cluster log is logging thee error during starting of the service, PHYSICAL DISK <DISK Q:> [DISKARB] FAILED TO READ (SECTOR 12), ERROR 170. I checked the Microsoft Site and it talks about when both nodes are coming up at the same time, but this is not the case, as one node is already up with resources online and everything. Nathaniel ________________________________ From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick Sent: Monday, July 25, 2005 10:20 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster Ruled out storage issues? Can we assume this is a SAN configuration? And I assume that the new node has the appropriate zoning information configured correctly for its WWN? That would be a change of course, but... What do you see in the event log on that node and for the cluster? ________________________________ From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA Sent: Mon 7/25/2005 10:03 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster I did evict the node, forcedcleanup, rebuilt a new member server, joined it to the domain, added it as a node to the existing cluster. This is the result of that. I dont see any way this could be a naming issue as the name resolution for DNS and WINS is completely functional from that node to other nodes, and vice versa. Storage config is ruled out because there has not been any change in our storage setup. ________________________________ From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick Sent: Monday, July 25, 2005 9:57 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster I'm confused. Why didn't you just evict the failing node and join the new one? Are you sure you don't have a naming issue or perhaps a storage config issue? I see nothing about either of those. Al ________________________________ From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA Sent: Mon 7/25/2005 9:49 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster I did confirm that the cluster service account is a member of the local administrators account on both boxes and that the passwords I entered are correct and the account is not locked out and it has the correct user rights on the local node. I wish that were the answer!! -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hunter, Laura E. Sent: Monday, July 25, 2005 9:42 AM To: [email protected] Subject: RE: [ActiveDir] OT: Windows 2003 Cluster Have you explicitly added the cluster service account to the local Administrators group on the two nodes? I had a few bizarre niggling cluster issues that were resolved by doing that. Even though the service account was already a local admin on the box by virtue of group membership, the cluster service didn't seem to be satisfied until I had specifically added the individual user account itself. HTH - Laura > -----Original Message----- > From: Bahta Nathaniel V Contr NASIC/SCNA > [mailto:[EMAIL PROTECTED] > Sent: Monday, July 25, 2005 6:07 AM > To: [email protected] > Subject: RE: [ActiveDir] OT: Windows 2003 Cluster > > Hey gang, > > I have a 2003 cluster and one of the nodes was rebuilt because it was > failing. I cannot get the quorum resource to function correctly on > the new node. > > Here is what I have done: > > Rebuilt and patched the failing node. > > Blocked all group policy I could and put it in a separate OU. > > Used KB article to ensure cluster service account has appropriate > permissions on node. > > Used KB article to ensure LOCAL SERVICE accounts and SERVICE accounts > have appropriate permissions on the node. > > Disabled LMHASH storage requirement of 14 character cluster service > account password. > > Compared services and security on failing node using Resultant Set of > Policy wizard and verified that both nodes have the same security in > place. > > Regenerated failing WMI database repository on failing node. > > Started cluster service on failing node using /fixquorum switch. > > Attempted to start cluster service on failing node using /resetquorum > switch .... It failed to start cluster service producing an 1067 error > > Rebuilt quorum from functioning node by copying ChXXX.tmp file from > source node to failing node in safe mode and renaming ChXXX.tmp to > CLUSDB > > Ran NTBACKUP.EXE on functioning node and backed up the system state, > restored Cluster Information using the system state backup and used > option to restore quorum info to all nodes as well. > > > Does anyone have any ideas on how to make the quorum function on the > new node? Any help would be appreciated greatly. > > Thanks, > > Nathaniel Bahta > GD-NS > List info : http://www.activedir.org/List.aspx > List FAQ : http://www.activedir.org/ListFAQ.aspx > List archive: > http://www.mail-archive.com/activedir%40mail.activedir.org/ > > List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/ List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
<<winmail.dat>>
