RE: [ActiveDir] Replication issues

joe Thu, 29 Apr 2004 15:49:01 -0700

I went back and looked into the activedir org archives and found the hotfix number, it is 812499 (http://support.microsoft.com/?scid=812499). It is included in SP4. We applied it quite a long time ago (I want to say over a year ago) and it works fine and apply it to every new server we spin up (well not the K3 ones...).

If you are going to slowly do SP4, you may want to see if you can get the hotfix itself.

joe

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rimmerman, Russ
Sent: Wednesday, April 28, 2004 9:38 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [ActiveDir] Replication issues

I'm curious to verify if the password chaining thing was fixed in SP3 or SP4, as we are still experiencing that issue. Some of our domain controllers are on SP3 and some are on SP4. We set SP3 as a company-wide standard for Win2k, but some of our other divisions took it upon themselves to upgrade without telling us! At any rate, that is exactly the problem we are seeing!

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of joe
Sent: Wednesday, April 28, 2004 6:48 AM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] Replication issues

1. What do you think your replication latency is supposed to be based upon your knowledge of your topology and your link configurations? This isn't something you have to guess at. Look at your DC placement and your replication topology and it will tell you the exact theoretical max replication period you have.

2. What do you want it to be?

30-60 minutes would be a time frame for replication that means you changed the default link settings. The default it 180 minutes per link (hop). This can be reduced to as low as 15 minutes without change notification and if you enable change notification it can go down to seconds (based on how busy the bridge heads between the sites are). As a rule, people don't generally set up change notification across a WAN [1]. 30-60 minutes could mean that you have 2-4 hops to get to the site with 15 minute delays or it could be you have 1-2 hops with 30 minute delays or it could be 1-2 hops with 15 minute delays with lots of DCs in each site and it taking 15 minutes to get to the proper outgoing bridgehead for each site. Lots of valid reasons for the timing, you need to understand what your theretical maxes could be and then decide if you are outside of that. If outside of that the first thing I would do is look at my DRA Pending Queue on my servers in the replication path to make sure it was zeroing out every replication period. [2]

One thing I saw below I wanted to speak about... The out of band password force back to the PDC has been in W2K since RTM at least. It will get that password back immediately unless the PDC is really busy or otherwise unavailable (down, net down, PacMan on the ethernet line eating all of the packets, etc).

Now after all of this I will say you should NOT have to worry about changing passwords at the specific site. Assuming the PDC is available to that site, you should be able to change a password anywhere on any DC and that password will get back to the DC. Then the client should be able to log on ANYWHERE. What SHOULD happen is that the local DC should realize, hey this password isn't correct and will do what is called a PDC Chaining to ask the PDC what if the password specified is in fact ok [3]. Assuming the password is ok, the PDC will say, that is fine and let the user log on. This functionality has been in Windows all the way back in NT. Without it, life in large companies would be miserable.

Now there has been change in the functionality since 2K RTM to fix what I consider a design flaw / bug in this process. I can't recall when that exactly went in for 2K (SP3?) but was in K3 RC1; I have written previously about this fix on this list. Basically the issue was if the user needed to change the password on the next logon and the PDC chaining event occurred, the logon would succeed and client would be told to display the change password dialogue. The user would respond and use the "old password" of the password they just used to logon. Since that password wasn't yet at the local DC that was handling this change password request the local DC would say that the old password was incorrect and reject the change. I have already speculated in previous posts to this list about what was happening. Basically it was fixed by sending back key information to the remote DC during a PDC Chaining operation that brought that DC up to date for some critical authentication information so that it did indeed have the latest password information for that user.

So all of that to say, that unless you have horrendous network connectivity, you should not have to set passwords on specific DCs if you are up to the current patch levels of Windows 2000 or on Windows 2003 for your domain controllers.

joe

[1] There are exceptions here so I am not looking for people to email say, we are and here'e why... There are a couple of special cases where I do it as well - to keep exchange in a good mood. The exceptions make the rule and show the beauty of the flexibility of the system.

[2] Keep in mind there was a bug in a hotfix or two between SP2-3 that caused this queue to not have good values. It would increment sometimes and exit without remembering to decrement. Very unusual as it will look almost like you queue isn't clearing. In this case, you can pull out repadmin /queue or my adqueueloop to look at the actual queue and verify what it is doing. This is fixed in SP4 and actually one of the 4 new hotfixes that just came out also corrects it (obviously the bin with that code was replaced in one of the fixes and it has all of the previous fixes in it as well). So if you are at the minimum you should be for these last three crits, your counters should be working ok.

[3] The DCs realize that they may not have the latest password and go ask the "master" for verification. This is one of the "big" functions of the PDC, being "master" of the passwords. It may not have the current right password, but it is final arbiter on whether or not a certain password can be used if another DC isn't sure.

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Coleman, Hunter
Sent: Tuesday, April 27, 2004 10:46 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [ActiveDir] Replication issues

It's strictly a judgment call. You decide how important it is to have password changes replicate *now* and then weigh that against the costs of having very low replication latency. Costs might include available bandwidth, other applications using the same network, etc...

In general, I'd stay away from letting this be the driving factor in determining your replication schedule. Change the password in the user's site, and 99% of the time the user should be fine within 15 minutes (default intrasite maximum replication period if you have 5 or more DCs in the site) or less.

From: Rimmerman, Russ [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 7:40 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [ActiveDir] Replication issues

What does changing the replication schedules explicitly for password resets entail, and is it recommended?

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Coleman, Hunter
Sent: Tuesday, April 27, 2004 8:25 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [ActiveDir] Replication issues

Unless you want to start changing your replication schedules explicitly for password resets, you're doing the right thing. Change the password on a DC in the user's site. If you're at SP4 (I think, could have been SP3) then the password change will also get sent on to the PDC emulator immediately. Anytime a user enters an incorrect password, the local DC will pass on the request to the PDCE in case the password had changed on a different DC.

The Account Lockout Status tool is probably the best utility for checking on password replication. Among other things, it will show the timestamp for password last set on each domain controller, so you can have a good idea of the replication state on the change. http://www.microsoft.com/downloads/details.aspx?FamilyID=d1a5ed1d-cd55-4829-a189-99515b0e90f7&DisplayLang=en (watch for URL wrap)

Hunter

From: Rimmerman, Russ [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 7:07 AM
To: '[EMAIL PROTECTED]'
Subject: [ActiveDir] Replication issues

We have always been having weird issues with replication. We have about 30 AD sites all over the world. When we change or reset a password here for a user at a remote site, it takes quite a long time (30-60 minutes or more) to replicate to the users site. So, we are having to connect to their local domain contoller and reset the password there. What is the best practice for setting up and tuning replication and resetting passwords, and what tools are recommended (replmon?) for "testing" it, and how long should it take?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This e-mail is confidential, may contain proprietary information
of the Cooper Cameron Corporation and its operating Divisions
and may be confidential or privileged.

This e-mail should be read, copied, disseminated and/or used only
by the addressee. If you have received this message in error please
delete it, together with any attachments, from your system.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RE: [ActiveDir] Replication issues

Reply via email to