(sorry about the long winded email)
Hi all,
I'm hoping for a little insight here so I can avoid another evening like
last night.
Client has main office with PDC and a second DC. One satellite office
with one DC. The PDC and the backup both have DNS installed. At 6pm
authentication to anything at the main site started failing. I could
not log on to either DC in the office with admin user. I could not log
onto TS with my user account. Remote office is a timezone behind and
still working...I received calls that everyone lost connection to
exchange server at main office.
Authentication from remote DC continued to function.
Rebooted both DCs at main office and AD authentication resumed and
dcdiag came back clean. Looking at logs I see what happened, but don't
know why it was so catastrophic.
It looks like the PDC's F drive, which is a fiber connected raid array,
hiccuped for some reason and was unavailable for about one minute.
First error in directory services log is:
/NTDS (544) NTDSA: An attempt to write to the file "F:\NTDS\edb.log" at
offset 10049536 (0x0000000000995800) for 512 (0x00000200) bytes failed
after 23 seconds with system error 2 (0x00000002): "The system cannot
find the file specified. "./
At this point, the DC stopped acting as a DC and began rejecting all
authentication requests until a reboot was initiated. I don't
understand a couple of things. First, I was actually surprised that
there is NTDS folder on the F drive. This is the file server and I
certainly wouldn't want it on the same partition as the company file
shares. There is also a NTDS folder on the C:/windows and I see this in
the logs this morning, which I would assume means it is using this
partition for NTDS:
/
NTDS (544) NTDSA: Online defragmentation has completed a full pass on
database 'C:\WINDOWS\NTDS\ntds.dit'/.
Can anyone help me understand why it would exist on both drives and if
it is on both drives, why would this hiccup bring the network to its knees?
My other huge issue is why did my secondary DC not start
authenticating. I couldn't even log onto it during the down time and it
has AD and DNS running on it. There are no errors in the DS log viewer
on this server during the downtime, but there are many replication
errors logged on the Primary DC related to the backup DC trying to
replicate to the Primary while the Primary was non responsive.
Thanks for any help, guys.
Bill
~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~