Almost sounds like a cluster is not providing the benefits you were after. 
 
Not sure I can be of any help with the next piece.  That is odd, but you might 
have a look at the TS servers and see if they're logging anything else.  Same 
with the cluster to see if anything in the security logs.  Might be to do with 
the hotfix? 
 
Al

________________________________

From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Tue 7/26/2005 1:48 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


Well AL,  so far I have figured out that the cluster account requires a 15 
character or greater password without SP1 or the hotfix for it.  So I changed 
the account password and restarted the services and both nodes are online.  The 
only problem now is that I only see half the printers on the new node, and our 
shares are inaccessible from the cluster.  I get a error when trying to log on 
as a regular user, not a admin, that states :  YOU DO NOT HAVE PERMISSION TO 
ACCESS YOUR CENTRAL PROFILE LOCATED AT \\SERVERNAME\SHARE$\USERNAME.  CONTACT 
YOUR NETWORK ADMINISTRATOR.  It is a Userenv Source with an Event ID of 1000.  
So now everybody wants to know why they cant get their profiles and I am 
scrambling for an answer.  Its not permissions, or share permissions, I have 
opened them wide open and I cant understand it because it only happens to 
regular users and only users of the Terminal Server enviroment.
 
Today is a crazy day!!!!
 
Nate 

________________________________

From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 2:40 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


I'm interested to hear how it works out.  
 
When I mentioned the HBA, I was thinking more along the lines of ensuring that 
there are no issues with the physical hba.  When an HBA goes, symptoms are 
often strange and not expected.  Same for the ports and switches between the 
hba and the SAN. 
 
Al 

________________________________

From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 1:10 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


Yes,  I pulled up the config gui and read the config and compared the 
functioning node's config with the failing nodes config and they are identical. 
 The HBA sees all assigned LUNS as well.  I dont think it is a storage issue.  
I have been on the phone with Microsoft and they said it may be a security 
issue and for me to reset the cluster account passwords and recycle the 
services on both nodes,  however I cannot do that until there is downtime 
allowable so probably will have to try that tonight or something.  I dont 
understand their idea of it being a password issue though, because they had me 
log in as the cluster service account, but they said the DC's may have a 
different password in AD than the cluster nodes have in SCM.  They said it 
doesnt make sense either but for me to try it.
 
Nate

________________________________

From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 12:08 PM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


Have you also verified that the HBA is functioning correctly? 

________________________________

From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 11:21 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


It had WMI access denied errors that entailed ripping apart the repository of 
the WMI database and since WMI was not starting the cluster could not read the 
WMI information and did not see the other node properly.  I used the 
resetquorum switch which failed with a 1067 could not start service error at 
the command line.  Our Microsoft Premier support call entailed doing everything 
I already did, and then they started researching (Google), so I told them I 
would keep troubleshooting, and for them to call me back when they think of 
something as well.  I have confirmed that the WWN on the SAN is the WWN on the 
HBA that is in the failing node, and the configuration is in tact for that 
node.  
 
Nathaniel

________________________________

From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 11:01 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


It's this that gives me the heartache: " The SAN still has the configuration 
data for the WWN of the node "       
 
In my experience, whenever troubleshooting always assume nothing is correct and 
troubleshoot accordingly. Those errors indicate that it cannot talk to the disk 
properly. It's possible that's because the other node owns it, however it is 
also possible that a configuration change has been made at some point. 
 
It pays to be suspicious of the configuration even if you think it has already 
been done a long time ago. It is not a static configuration and it's worth it 
to ensure that it is configured properly. After all, the other node failed for 
a reason right? 
 
I also assume that you used the -resetquorum etc switches (syntax) right? 
 
That looks suspiciously like a disk access error though.  Something about not 
being able to read the disk which may also indicate a failure at a different 
level (HBA for example?)
 
Out of curiousity, what was the failure that the node was exhibiting prior to 
rebuild?
 
Al
 
 
 

________________________________

From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 10:36 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


You are correct, this is a SAN configuration with JNI FC HBA's.  The node was 
configured and running for a long while before it failed.  The SAN still has 
the configuration data for the WWN of the node, as it was already configured as 
a node previously.  Same node, same card, same WWN, same system, same name, 
everything is the same basically.  In the event log the only errors that 
present themselves is a 1209 error from the system log, source ClusDisk, 
Description: Cluster service is requesting a bus reset for device 
\Device\Clusdisk3Part0.  Other than that its not logging any other errors.  The 
cluster log is logging thee error during starting of the service, PHYSICAL DISK 
<DISK Q:> [DISKARB] FAILED TO READ (SECTOR 12), ERROR 170.   I checked the 
Microsoft Site and it talks about when both nodes are coming up at the same 
time, but this is not the case, as one node is already up with resources online 
and everything.
 
Nathaniel

________________________________

From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 10:20 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


Ruled out storage issues?  Can we assume this is a SAN configuration? And I 
assume that the new node has the appropriate zoning information configured 
correctly for its WWN?  That would be a change of course, but...
 
What do you see in the event log on that node and for the cluster?

________________________________

From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 10:03 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


I did evict the node, forcedcleanup, rebuilt a new member server, joined it to 
the domain, added it as a node to the existing cluster.  This is the result of 
that.  I dont see any way this could be a naming issue as the name resolution 
for DNS and WINS is completely functional from that node to other nodes, and 
vice versa.  Storage config is ruled out because there has not been any change 
in our storage setup.  

________________________________

From: Al Mulnick [mailto:[EMAIL PROTECTED] On Behalf Of Al Mulnick
Sent: Monday, July 25, 2005 9:57 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster


I'm confused.  Why didn't you just evict the failing node and join the new one? 
 Are you sure you don't have a naming issue or perhaps a storage config issue?  
I see nothing about either of those. 
 
Al

________________________________

From: [EMAIL PROTECTED] on behalf of Bahta Nathaniel V Contr NASIC/SCNA
Sent: Mon 7/25/2005 9:49 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster



I did confirm that the cluster service account is a member of the local 
administrators account on both boxes and that the passwords I entered are 
correct and the account is not locked out and it has the correct user rights on 
the local node.  I wish that were the answer!!

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hunter, Laura E.
Sent: Monday, July 25, 2005 9:42 AM
To: [email protected]
Subject: RE: [ActiveDir] OT: Windows 2003 Cluster

Have you explicitly added the cluster service account to the local 
Administrators group on the two nodes?  I had a few bizarre niggling cluster 
issues that were resolved by doing that.  Even though the service account was 
already a local admin on the box by virtue of group membership, the cluster 
service didn't seem to be satisfied until I had specifically added the 
individual user account itself.

HTH

- Laura

> -----Original Message-----
> From: Bahta Nathaniel V Contr NASIC/SCNA
> [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 25, 2005 6:07 AM
> To: [email protected]
> Subject: RE: [ActiveDir] OT: Windows 2003 Cluster
>
> Hey gang,
>
> I have a 2003 cluster and one of the nodes was rebuilt because it was
> failing.  I cannot get the quorum resource to function correctly on
> the new node.
>
> Here is what I have done:
>
> Rebuilt and patched the failing node.
>
> Blocked all group policy I could and put it in a separate OU. 
>
> Used KB article to ensure cluster service account has appropriate
> permissions on node.
>
> Used KB article to ensure LOCAL SERVICE accounts and SERVICE accounts
> have appropriate permissions on the node.
>
> Disabled LMHASH storage requirement of 14 character cluster service
> account password.
>
> Compared services and security on failing node using Resultant Set of
> Policy wizard and verified that both nodes have the same security in
> place.
>
> Regenerated failing WMI database repository on failing node.
>
> Started cluster service on failing node using /fixquorum switch.
>
> Attempted to start cluster service on failing node using /resetquorum
> switch .... It failed to start cluster service producing an 1067 error
>
> Rebuilt quorum from functioning node by copying ChXXX.tmp file from
> source node to failing node in safe mode and renaming ChXXX.tmp to
> CLUSDB
>
> Ran NTBACKUP.EXE on functioning node and backed up the system state,
> restored Cluster Information using the system state backup and used
> option to restore quorum info to all nodes as well.
>
>
> Does anyone have any ideas on how to make the quorum function on the
> new node?  Any help would be appreciated greatly.
>
> Thanks,
>
> Nathaniel Bahta
> GD-NS
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
>
>
List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/


<<winmail.dat>>

Reply via email to