Richard,
Thank you very much for the information. We are going
to take a pass on SP4 until seeing the documentation on 828297 and doing some
more testing.
Side question - you mention specific stress tests
when you are going to test 828297, what tools or programs are you using do
this?
Thanks again for the information. You
have saved us a huge amount of grief.
-Stuart Fuller
State of Montana
From: Puckett, Richard [mailto:[EMAIL PROTECTED]
Sent: Friday, September 19, 2003 2:48 PM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] SP4 or not SP4? (hotfixes 824226 & 828297)
Stuart,
We originally installed SP4 near the
beginning of August on all of our production Domain Controllers after testing it
in our (mirror of production) lab. Within two production workdays we began
to see the same issues Vladimir mentioned in his BUGTRAQ e-mail and we opened a
case with MS. Since the problem was readily identifiable, we were able to
get a copy of KB824226, which we tested, then installed. Later
on in the week we found that KB824226 had introduced an as-yet unknown
LSASS problem associated with global heap allocations that were not being
released (below are a few of the telltale signs of a post-KB824226
DC in resource distress) which resulted in resource deprivation that caused
most of the directory service-related functions to fail (failed replication,
logons, LDAP queries, etc.). At first we were concerned that the problems
might have been related somehow to the RPC/DCOM vulnerability being exploited by
potentially infected hosts on our network, but further analysis ruled this
out.
We worked with MS for approximately two weeks to find a
resolution for the problem, providing ADPerf, Event, UMDH and LSASS dump
data. Eventually KB828297 came into existence from the
analysis of data that we and other customers were
providing. Though MS did work hard to locate and correct the error,
KB828297 did not appear in a timely enough fashion for us to use, and
with more and more DCs failing we made the decision to back
out of SP4 to regain host stability, regressing to SP3.
We're currently running SP4 in one of our lab
configurations and are preparing to test KB828297 with some very specific stress
tests to ensure we don't encounter any new issues before re-deploying SP4.
Hope this data helps,
Richard
Post-KB824226 Early (and Late) Resource Consumption Warning Signs
Event
Type: Error
Event Source: KDC
Event Category: None
Event ID: 7
Date: 8/15/2003
Time: 3:44:00 PM
User: N/A
Computer: <DOMAIN CONTROLLER NAME>
Description:
The Security Account Manager failed a KDC request in an unexpected way. The error is in the data field. The account name was host/<workstation fqdn> and lookup type 0x48.
Data:
0000: 17 00 00 c0 ...�
Event Source: KDC
Event Category: None
Event ID: 7
Date: 8/15/2003
Time: 3:44:00 PM
User: N/A
Computer: <DOMAIN CONTROLLER NAME>
Description:
The Security Account Manager failed a KDC request in an unexpected way. The error is in the data field. The account name was host/<workstation fqdn> and lookup type 0x48.
Data:
0000: 17 00 00 c0 ...�
Event
Type: Warning
Event Source: NTDS General
Event Category: Internal Processing
Event ID: 1519
Date: 8/15/2003
Time: 12:59:50 PM
User: Everyone
Computer: <DOMAIN CONTROLLER NAME>
Description:
A Directory Service operation failed because the database has run out of version storage. If this error repeats frequently it most likely indicates that an object that is too large for the Directory Service to handle is attempting to replicate in. This object must be deleted or shrunk on a Directory Server where it already exists.
The internal id is 2020743.
Event Type: Error
Event Source: NTDS General
Event Category: Internal Processing
Event ID: 1168
Date: 8/20/2003
Time: 11:52:44 PM
User: DOMAIN\userid
Computer: <DOMAIN CONTROLLER NAME>
Description:
Error 8(8) has occurred (Internal ID 302022c). Please contact Microsoft Product Support Services for assistance.
Event Source: NTDS General
Event Category: Internal Processing
Event ID: 1519
Date: 8/15/2003
Time: 12:59:50 PM
User: Everyone
Computer: <DOMAIN CONTROLLER NAME>
Description:
A Directory Service operation failed because the database has run out of version storage. If this error repeats frequently it most likely indicates that an object that is too large for the Directory Service to handle is attempting to replicate in. This object must be deleted or shrunk on a Directory Server where it already exists.
The internal id is 2020743.
Event Type: Error
Event Source: NTDS General
Event Category: Internal Processing
Event ID: 1168
Date: 8/20/2003
Time: 11:52:44 PM
User: DOMAIN\userid
Computer: <DOMAIN CONTROLLER NAME>
Description:
Error 8(8) has occurred (Internal ID 302022c). Please contact Microsoft Product Support Services for assistance.
From: Fuller, Stuart [mailto:[EMAIL PROTECTED]
Sent: Friday, September 19, 2003 2:24 PM
To: '[EMAIL PROTECTED]'
Subject: [ActiveDir] SP4 or not SP4? (hotfixes 824226 & 828297)I *was* planning to go ahead and install SP4 on all of our production DC's this weekend. We have successfully tested it on our test bench and as a pilot in small separate forest.However, I have been following the notes by Vladimir Markovic on the NTbugtraq mailing list about LSASS and LDAP and those are making me a bit nervous to say the least. (These notes deal with hotfixes 824226 and 828297).I would like any comments from admins on the list with real-world experience with SP4 and AD. Specifically, those people running larger production environments (1,000+ users) and using applications that authenticate against AD via LDAP (e.g. PeopleSoft, Digite/Tufan, etc...). Has anyone else experienced the problems described in 824226?I have looked at the posts on Google from the Microsoft newsgroup and there does seem to be other admins that have been affected by this. I am trying to get a sense of whether this is a global problem or is limited to specific "unique" environments.Thanks,Stuart FullerAD DweebState of Montana
