RE: [ActiveDir] Replication issues

joe Sun, 02 May 2004 05:42:46 -0700

Cool looks like my ISP backed up on their SMTP outbound again... I sent this thing Friday morning and it looks like it hit the list Sunday morning...

joe

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of joe
Sent: Friday, April 30, 2004 7:49 AM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] Replication issues

There are three types of replication when you are talking about passwords .

1. Urgent replication. This is when a password changes anywhere, it sends out an urgent replication notification (i.e no hold back on the notification, it goes now - use my adqueueloop to watch for it). Again, the last time I looked, the priority is the same as any normal partition change (i.e. higher than a GC change but no higher than say the change of description for a default partition). So basically this goes into the queue and zips about the site (or anywhere change notification is enabled) within a fairly quick way, then stops dead when it hits the site walls and waits for the site link configurations (again change notification configured for site links can modify this). I actually mention that below. This type of replication uses all of the normal replication mechanisms so if your inbound thread is tied up on a DC with lots of default partition changes you could see any types of delay getting the change around (this is why you monitor DRA Pending - it needs to go to zero every replication period).

2. Immediate Replication. This is when a PDC is contacted via a specific RPC call to update the password when it is changed on another DC. This does not use the normal replication engine so isn't impacted by normal replication delays.This functionality has been in there since OEM. It is best effort, it will try to get the change back to the PDC but if something prevents it (busy PDC, network issues, dead PDC, etc) then the change just goes through the normal #1 replication. Also once again, AvoidPDConWAN setting impacts this, it completely disables it unless the PDC is in the same site as the DC where the password occurred. If you are NOT seeing this, I highly recommend auditing the DCs to make sure that the reg setting isn't set and that your PDC is working correctly and that network is ok.

3. Single User Object On Demand Replication or simply On Demand Replication. This is when a PDC chaining event occurs, the PDC immediately pushes the user info down to the DC that did the chaining. This is what 812499 adds to the mix (Also K3 RC1). This is not using the standard replication engine, this is completely out of band. This functionality does not exist pre-812499, it is a bug fix (well they didn't consider a bug but everyone who has had to deal with it has). It was a hole in the design and wasn't the intended customer experience. Replication delays will not impact this, however AvoidPDConWAN setting would because you don't chain when that is set. See the section of the previously mentioned doc called Single User Object On Demand Replication.

Locked out accounts again, this is a different story. I recall reading the diffs previously but don't have them on the tip of my tongue. PDC chaining does not occur in the same way for a locked out account. There is a difference. Account lockouts really shouldn't be happening a lot to normal users and not at all to admins unless they

1. Have an old crappy client

2. Have some bad software that does stupid things (old versions of outlook with an expired account for instance can generate hundreds of auths a second)

3. Are being attacked.

4. Are a bonehead

or you

1. Have the lockout policy set to some insane setting (like 3 bads and locked forever or for an hour or whatever, you want 3 bads and a lockout, fine, unlock in 5 min then).

All of those are correctable. I think you were around when I got in a fight with HP's first level folks because I took away their ability to unlock each others accounts. They said they needed it because they kept getting locked out. My response was they needed to get a little smarter and be careful and actually know what they are doing versus just bein a clicking bump on the log (man did I get in trouble for that one...). That was a very unhappy fight for them and they lost but the number of lockouts on that team went down drammatically.

If you have a lockout policy that tends towards locking out valid users, it needs to be reviewed. The concept of the lockout policy is to prevent cracking of passwords due to enough password attempts making it through. Careful control of the lockout policy tied with the password policy is how this is done correctly. You can turn up how many bads it takes to get a lockout and turn down how fastit unlocks automatically if you have a decent password policy. The longer the password policy the higher the lockout bad count can be. We have a policy standard of 5 bads which I think is ridiculously low. Due to bugs in Win9x it is currently set at 15 bads which is more realistic. Unlock time is 15 minutes as well. This means there could be ~60 attempts an hour which shouldn't be enough to compromise any decent password in any real time. Note if you have avoidpdconwan set, you have a possible security issue here which you should be thinking about and testing - especially if you have a lot of DCs that are all on the WAN and reachable from a single location.

If I had my druthers for a normal corporare environment, I would see password policies of like 20 bad, unlock in 15. Passwords of 15 characters or better. Simple complexity, BOTH upper and lower case. Obviously the MS Complexity filter doesn't fit that, but you can get that out of products like PSYNCH or just write your own filter. Passwords are changed at least every 84-91 days (multiple of 7) - NO NON-EXPIRING IDS EVER. Admin passwords changed every 30 days and this shouldn't have to be enforced by the system, you should be able to tell your admins, hey make sure your passwords don't go over 30 days - they shouldn't be logging on interactively to their workstations so they shouldn't be getting notifications for them most of the time anyway when they are approaching expiration. Admin passwords should NOT be in sync with normal passwords and probably should be longer than 15 characters, people start thinking pass phrases... Obviously the longer passwords won't work in an environment that you insist on keeping mainframes and other systems that can only support short passwords and you insist on syncing your IDs instead of using say kerberos authentication across platforms...

joe

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of deji Agba
Sent: Friday, April 30, 2004 1:34 AM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] Replication issues

The password will get replicated "out of band" [1] back to the PDC on a
password change. See
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/technologies/
security/bpactlck.mspx, specifically check the piece on "immediate
replication".

I missed this. Let's hope I don't get smacked too hard for it. But, are you saying password change qualifies for "immediate" (or urgent) replication? Not according to this:

By default, urgent replication does not occur across site boundaries. Because of this, administrators should make manual password changes and account resets on a domain controller that is in that user's site.

This is what acctinfo addressed. This was the problem I was facing a year ago. My helpdesk admins in Santa Clara reset an EMEA (or Tokyo) user's password. They call up the user and say "here's your password", user tries it and hits the lockout threshold, BAM! user is locked out. User gets really PO'ed because now he can't get helpdesk, because helpdesk had left for the day shortly after calling user. I unlock user's account, which now triggers urgent replication, tell user "wait for about 5-10 minutes and try it". User is then able to login and make that million dollars sales presentation. I get bonus, and I'm still employed because I'm the "Guru". Helpdesk get the shaft and they are pissed at me for not telling them about this "feature".

Now, I will shut up. Really :)

Sincerely,

D�j� Ak�m�l�f�, MCSE MCSA MCP+I

Microsoft MVP - Directory Services

www.readymaids.com - we know IT
www.akomolafe.com
Do you now realize that Today is the Tomorrow you were worried about Yesterday? -anon

From: joe
Sent: Thu 4/29/2004 3:43 PM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] Replication issues

The password will get replicated "out of band" [1] back to the PDC on a
password change. See
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/technologies/
security/bpactlck.mspx, specifically check the piece on "immediate
replication". 



"Theoretically, there should be no need for these tools, but in reality,
chaining did not work as designed."

Yes it actually does, I see it in action every single day. We process
thousands of password requests a day. It does work. Wherever the password is
changed, it gets back to the PDC and then whatever DC is hit, the request is
chained back to the PDC to allow the authentication. 


"before the locking out DC learns about the reset."

Lockouts are handled differently. Dig into the documentation. An unlock has
some special stuff around it in terms of how often it will go back and
check. I don't recall the details, however, not every attempt is sent back
to the PDC when the account is locally locked. I believe the logic was put
in to protect the PDC from DOSed from things like viruses and such that
pound the DCs. 


The "AvoidPDConWAN" will of course change the default functionality, that is
what it was designed to do. If someone blindly applied it without
understanding the repercussions, they deserve everything that happens to
them. See http://support.microsoft.com/default.aspx?scid=kb;EN-US;232690 /
http://support.microsoft.com/?kbid=225511 for more info on AvoidPDConWan
setting.

One other thing I want to point out that is usually documented horribly.
Password changes are urgently replicated within a site, not to all domain
controllers. So if you change a password, you will go through urgent
notification (i.e. bypassing the holdback time) within the site and those
DCs will replicate in an urgent manner [2]. Once you hit site boundaries
that are living with normal site link replication periods then you wait for
that replication period to come up to get that password sent across. So if
you have a 4 day wait on the link, then you wait that long to get that
replication through. If you don't have avoidpdconwan set though and you have
good connectivity, this will not be an issue. If you do, the very fact that
you set that setting means you WANT to have to go change the password on the
DC the user is using. In a simple environment this is a trivial thing to
work out (assuming proper configuration everywhere). In a large complex
environment this can be decidely non-trivial. 



  joe




[1] A specific RPC call is made. I have seen this in action with one of my
tools that watches DCs for changes and notifies on object modifications. The
longest delay I have seen has been about 500ms. However if the PDC is for
some reason unavailable, this call will fail and the password will get back
to the PDC through the standard replication methods.

[2] I don't believe however that the priority is any higher than any other
domain context change, just simply the notification is urgent which means
that if there is a queue on the inbound thread on what it is working on, it
will get thrown at the bottom of the items with the same priority.

 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Wednesday, April 28, 2004 7:30 PM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] Replication issues

>>It will get that password back immediately unless the PDC is really 
>>busy or
otherwise unavailable
The way I'm reading this is that you are saying password change will trigger
immediate replication to the PDCE. Iin my experience (which I don't have to
describe to you :)), this is not the case. Also, I may be misreading you
here, because, further now, you said:
 
>>What SHOULD happen is that the local DC should realize, hey this 
>>password
isn't correct and will do what is called a PDC Chaining to ask the PDC what
if the password specified is in fact ok [3] This is the way it works, I
agree here.
 
Now, you also said:
>>Assuming the PDC is available to that site, you should be able to 
>>change a
password anywhere on any DC and that password will get back to the DC.
This, too, is correct.
 
However the problem is the time it takes for the password change to get back
to the PDCE and then onward to the rest of the DC. Where neither the
HelpDesk (wo reset the password) no the User (whose password was reset) is
in the site where the PDCE is located, the length of time it takes for the
password change to travel across the wire is usually unacceptble. This is
the reason one wuld want to reset the password at a DC local to the User.
This is also one of the reasonss for ALToos, especially the AcctInfo.dll
part.
Theoretically, there should be no need for these tools, but in reality,
chaining did not work as designed. One DC would lock out a user's account,
after the user's password had been reset on another DC, before the locking
out DC learns about the reset.
 
Lastly, I have come across canned recommendations from "security
consultants"
telling clients to enable AvoidPDConWAN registry key. I am sure some
companies would have heeded that recommendation.
 
 
Sincerely,

D�j� Ak�m�l�f�, MCSE MCSA MCP+I
Microsoft MVP - Directory Services
www.readymaids.com - we know IT
www.akomolafe.com
Do you now realize that Today is the Tomorrow you were worried about
Yesterday?  -anon

________________________________

From: [EMAIL PROTECTED] on behalf of joe
Sent: Wed 4/28/2004 4:47 AM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] Replication issues


1. What do you think your replication latency is supposed to be based upon
your knowledge of your topology and your link configurations? This isn't
something you have to guess at. Look at your DC placement and your
replication topology and it will tell you the exact theoretical max
replication period you have. 
 
2. What do you want it to be?
 
 
30-60 minutes would be a time frame for replication that means you changed
the default link settings. The default it 180 minutes per link (hop). This
can be reduced to as low as 15 minutes without change notification and if
you enable change notification it can go down to seconds (based on how busy
the bridge heads between the sites are). As a rule, people don't generally
set up change notification across a WAN [1]. 30-60 minutes could mean that
you have
2-4 hops to get to the site with 15 minute delays or it could be you have
1-2 hops with 30 minute delays or it could be 1-2 hops with 15 minute delays
with lots of DCs in each site and it taking 15 minutes to get to the proper
outgoing bridgehead for each site. Lots of valid reasons for the timing, you
need to understand what your theretical maxes could be and then decide if
you are outside of that. If outside of that the first thing I would do is
look at my DRA Pending Queue on my servers in the replication path to make
sure it was zeroing out every replication period. [2] 
 
One thing I saw below I wanted to speak about... The out of band password
force back to the PDC has been in W2K since RTM at least. It will get that
password back immediately unless the PDC is really busy or otherwise
unavailable (down, net down, PacMan on the ethernet line eating all of the
packets, etc). 
 
Now after all of this I will say you should NOT have to worry about changing
passwords at the specific site. Assuming the PDC is available to that site,
you should be able to change a password anywhere on any DC and that password
will get back to the DC. Then the client should be able to log on ANYWHERE.
What SHOULD happen is that the local DC should realize, hey this password
isn't correct and will do what is called a PDC Chaining to ask the PDC what
if the password specified is in fact ok [3]. Assuming the password is ok,
the PDC will say, that is fine and let the user log on. This functionality
has been in Windows all the way back in NT. Without it, life in large
companies would be miserable. 
 
Now there has been change in the functionality since 2K RTM to fix what I
consider a design flaw / bug in this process. I can't recall when that
exactly went in for 2K (SP3?) but was in K3 RC1; I have written previously
about this fix on this list. Basically the issue was if the user needed to
change the password on the next logon and the PDC chaining event occurred,
the logon would succeed and client would be told to display the change
password dialogue. The user would respond and use the "old password" of the
password they just used to logon. Since that password wasn't yet at the
local DC that was handling this change password request the local DC would
say that the old password was incorrect and reject the change. I have
already speculated in previous posts to this list about what was happening.
Basically it was fixed by sending back key information to the remote DC
during a PDC Chaining operation that brought that DC up to date for some
critical authentication information so that it did indeed have the latest
password information for that user. 
 
So all of that to say, that unless you have horrendous network connectivity,
you should not have to set passwords on specific DCs if you are up to the
current patch levels of Windows 2000 or on Windows 2003 for your domain
controllers. 
 
  
   joe
 
 
 
 
[1] There are exceptions here so I am not looking for people to email say,
we are and here'e why... There are a couple of special cases where I do it
as well - to keep exchange in a good mood. The exceptions make the rule and
show the beauty of the flexibility of the system.
 
[2] Keep in mind there was a bug in a hotfix or two between SP2-3 that
caused this queue to not have good values. It would increment sometimes and
exit without remembering to decrement. Very unusual as it will look almost
like you queue isn't clearing. In this case, you can pull out repadmin
/queue or my adqueueloop to look at the actual queue and verify what it is
doing. This is fixed in SP4 and actually one of the 4 new hotfixes that just
came out also corrects it (obviously the bin with that code was replaced in
one of the fixes and it has all of the previous fixes in it as well). So if
you are at the minimum you should be for these last three crits, your
counters should be working ok. 
 
[3] The DCs realize that they may not have the latest password and go ask
the "master" for verification. This is one of the "big" functions of the
PDC, being "master" of the passwords. It may not have the current right
password, but it is final arbiter on whether or not a certain password can
be used if another DC isn't sure. 
 
 

________________________________

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Coleman, Hunter
Sent: Tuesday, April 27, 2004 10:46 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [ActiveDir] Replication issues


It's strictly a judgment call. You decide how important it is to have
password changes replicate *now* and then weigh that against the costs of
having very low replication latency. Costs might include available
bandwidth, other applications using the same network, etc...
 
In general, I'd stay away from letting this be the driving factor in
determining your replication schedule. Change the password in the user's
site, and 99% of the time the user should be fine within 15 minutes (default
intrasite maximum replication period if you have 5 or more DCs in the site)
or less.

________________________________

From: Rimmerman, Russ [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 7:40 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [ActiveDir] Replication issues


What does changing the replication schedules explicitly for password resets
entail, and is it recommended?

________________________________

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Coleman, Hunter
Sent: Tuesday, April 27, 2004 8:25 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [ActiveDir] Replication issues


Unless you want to start changing your replication schedules explicitly for
password resets, you're doing the right thing. Change the password on a DC
in
the user's site. If you're at SP4 (I think, could have been SP3) then the
password change will also get sent on to the PDC emulator immediately.
Anytime a user enters an incorrect password, the local DC will pass on the
request to the PDCE in case the password had changed on a different DC.
 
The Account Lockout Status tool is probably the best utility for checking on
password replication. Among other things, it will show the timestamp for
password last set on each domain controller, so you can have a good idea of
the replication state on the change.
http://www.microsoft.com/downloads/details.aspx?FamilyID=d1a5ed1d-cd55-4829-
a
189-99515b0e90f7&DisplayLang=en (watch for URL wrap)
 
Hunter

________________________________

From: Rimmerman, Russ [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 27, 2004 7:07 AM
To: '[EMAIL PROTECTED]'
Subject: [ActiveDir] Replication issues


We have always been having weird issues with replication.  We have about 30
AD sites all over the world.  When we change or reset a password here for a
user at a remote site, it takes quite a long time (30-60 minutes or more) to
replicate to the users site.  So, we are having to connect to their local
domain contoller and reset the password there.  What is the best practice
for
setting up and tuning replication and resetting passwords, and what tools
are
recommended (replmon?) for "testing" it, and how long should it take?
List info   : http://www.activedir.org/mail_list.htm
List FAQ    : http://www.activedir.org/list_faq.htm
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

List info   : http://www.activedir.org/mail_list.htm
List FAQ    : http://www.activedir.org/list_faq.htm
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

RE: [ActiveDir] Replication issues

Reply via email to