Thanks Brett! I will definitely make a copy of the ntds directory before any changes. I also plan to do a full hardware check before defragging/restoring. Thanks.
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brett Shirley Sent: Friday, August 26, 2005 6:15 AM To: [email protected] Subject: RE: [ActiveDir] Database Corruption Alex, Unfortunately, only the developer version of eseutil.exe gives out more info, including a raw hex dump of the page. I'm a little curious, to see if the tail of 81183, and the head of 81184 look skewed, sometimes we've seen a disk corruption, where the bytes seem right, just off by several bytes ... but maybe a probably explanation will present itself by just the output of the header ... If you make a copy of the bad database (& logs), before you defrag or restore, it gives you / us the chance to ask more questions about the nature of the corruption later ... Cheers, BrettSh [msft] > This posting is provided "AS IS" with no warranties, and confers no > rights. On Tue, 23 Aug 2005, Al Mulnick wrote: > Hopefully it's just an index that's taken one for the team. > Take the advice and ensure that the hardware is solid before > declaring things well enough to be restored etc. This was the type of > error in the Exchange world that would bug you till the end. It was > associated with everything from disk controller settings (battery > backup) to faulty disks, to transient hardware errors. Tough to > diagnose, but almost always a hardware error (like >99% of the time) > was the root cause. Software issues were sometimes to blame > (misonfigured AV etc) that would take things out but see above for the > frequency of that. > The fact that it stays the same is a good thing. The fact that it > occurred at all is not. Disk or other hardware would be my next > suspect. All the way down to the motherboard (checked the revs to > ensure no issues yet?) > I have to also admit that a restore is not my favorite method if the > bandwidth can support it. I'd prefer to dcpromo the repaired piece of > hardware, especially for a smaller DIT. That's just my preference > though. > Good luck, > > Al > > ________________________________ > > From: [EMAIL PROTECTED] on behalf of Alex Fontana > Sent: Mon 8/22/2005 9:30 PM > To: [email protected] > Subject: RE: [ActiveDir] Database Corruption > > > > ECC memory, no errors in the event logs relating to memory. The ntds.dit is > about 800MB. There are multiple events, the page number is always the same > (81184). > > Haven't fixed it yet - it's limping along until this weekend when I'll dump > the pages to see what the header shows - then either defrag or restore... > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Brett Shirley > Sent: Monday, August 22, 2005 10:22 AM > To: [email protected] > Subject: RE: [ActiveDir] Database Corruption > > Both Steve, Hunter's, and your original advice is sound ... I think it is > very likely if you call PSS, they'll tell you to do Steve's, yours, and > Hunter's advice in about that order. > > My favorite disk sub-system diagnostics is jetstress, but dedicated disk > sub-system stressers are better, as they try odd patterns of bits that > they know buses, electrical systems, and disks get fouled up on. Also do > not ignore RAM checkers, that is almost as likely, perhaps even more > likely here. > > Do you have ECC or parity memory? Any events in system or app event log > related to parity memory issues? > > BTW, how big is your ntds.dit file? Is it over 1.5-2.5 GBs? That > increases the hypothesis of memory issues. > > So you have multiple of these events? If you do, do they always happen > for the same page numbers ("pgno") and offsets? If different, does thier > frequency increase? > > If you haven't restored it already, I'd be curious if you felt like > sharing, what the page looked like from: > esentutl /m ntds.dit /p81184 /v > ... then we could see how bad the header was corrupted. Also this will > tell you if the page is an "Index page", and thus likely to be fixed by an > offline defrag. If you see "primary" or "long value" page, offline defrag > probably won't fix it. > > Also get the previous page too (change 81184 to 81183 in the above > command). But again, only if you feel like sharing. > > Cheers, > BrettSh > > This posting is provided "AS IS" with no warranties, and confers no > rights. > > > > On Sat, 20 Aug 2005, Coleman, Hunter wrote: > > > I'd also look at running hardware diagnostics, particularly on the > > disk subsystem and controller. No point in restoring or repromoting if > > there is an unresolved hardware problem. > > > > -----Original Message----- > > From: [EMAIL PROTECTED] on behalf of Steve Linehan > > Sent: Fri 8/19/2005 8:18 PM > > To: [email protected] > > Cc: > > Subject: RE: [ActiveDir] Database Corruption > > > > Well the first thing I always recommend is to try an offline > > defrag as it is possible that the corruption is in an index, i.e. > > metadata, that can be rebuilt. If the offline defrag fails then > > restoring from backup or repromoting will be your next step. > > > > Thanks, > > -Steve > > _____ > > > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ayers, Diane > > Sent: Friday, August 19, 2005 6:43 PM > > To: [email protected] > > Subject: RE: [ActiveDir] Database Corruption > > > > My preferred approach would be to demote the box to member > > server and re-promote to a domain controller to ensure a good fresh > > copy of the DIT. YMMV as the specific requirements at your location > > may prevent this. We have only run into this once early in our AD > > days and this was the approach we used with good success. > > > > Diane > > _____ > > > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Alex Fontana > > Sent: Friday, August 19, 2005 3:29 PM > > To: [email protected] > > Subject: [ActiveDir] Database Corruption > > > > Started getting the error below a few weeks ago on one of our > > DCs. My first reaction is to run a non-auth restore from a day before > > this started happening and let replication take care of everything > > else. Any reason NOT to do this? IâEUR(tm)m concerned that this may > > happen again and wasnâEUR(tm)t able to find anything specific to the error > > below. Besides calling PSS any thing else I should look into before > > restoring? This box holds all FSMO roles, Win2k3, server for NIS. > > > > TIA > > -alex > > > > > > Event Type: Error > > Event Source: NTDS ISAM > > Event Category: Database Page Cache > > Event ID: 475 > > Date: 8/19/2005 > > Time: 2:00:24 PM > > User: N/A > > Computer: DC > > Description: > > > > NTDS (528) NTDSA: The database page read from the file > > "C:\WINNT\NTDS\ntds.dit" at offset 665067520 (0x0000000027a42000) for > > 8192 (0x00002000) bytes failed verification due to a page number > > mismatch. The expected page number was 81184 (0x00013d20) and the > > actual page number was 2349964126 (0x8c119b5e). The read operation > > will fail with error -1018 (0xfffffc06). If this condition persists > > then please restore the database from a previous backup. This problem > > is likely due to faulty hardware. Please contact your hardware vendor > > for further assistance diagnosing the problem. > > > > > > > > > > List info : http://www.activedir.org/List.aspx > List FAQ : http://www.activedir.org/ListFAQ.aspx > List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/ > [EMAIL PROTECTED] Vry&-4ibb > > List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/ List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
