Cool, got Brett to sit up and type... Crap, now I have to read it. j/k, I like long answers from people like Brett, it gives insight into the person as well as into the technology. When people ask, how do you know so much about xxxx, it is because I piss off the people to make them teach me how it really works. That is how I learned most of the Exchange stuff back when I first started working on it. ;o)
> Joe, is the DB corrupt? An AD object without an RDN? Good example, I would have to say maybe in that case. I expect it would either be a normal occurrence or take a serious failure of the AD App layer to allow that to occur unless ESE for some reason decided not to write or retrieve it properly. Even though it isn't required at the ESE Layer, I expect at some level of AD there is something enforcing the setting of that column. I don't know enough about the mechanics to say if it bad or not. > be very thankful Win2k3 AD isn't on SQL 2000, because it has > few such protections, though SQL 2005 finally caught up, 10 > years after the fact, it's such a legacy DB, really ... anyway. I am. Thank you Brett. Even though I want triggers and business rules, I would rather see them make it into ESE than move AD to SQL. In fact, I tell everyone who will listen that I will likely not willingly get very serious with MIIS while it is sitting on SQL. I would prefer to see ESE under it. I like ESE. I would even wear a Brett says ESE rocks T-Shirt if I had one with that ugly mug of yours on it. joe -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brett Shirley Sent: Tuesday, December 06, 2005 3:04 PM To: [email protected] Subject: RE: [ActiveDir] Ntds.dit file corruption I wouldn't say that, joe ... Lets take another hypothetical real quick, lets say you have a column for the RDN of an AD object (well we do) and that value is NULL. From AD's perspective this object is well not really an object, it would be corrupt, and might even crash lsass.exe (I don't know, it might). However, from ESE's persepctive though, the table/row/column is valid, it has a particular column that doesn't have a value. A column which I might add is declared "optional" (real term is tagged) in the ESE layer "schema" (real term is catalog). ESE is simply a store of data, it passes no judgement on the data as long as it fits the schema guidelines for the column. Joe, is the DB corrupt? An AD object without an RDN? ---- I have tendency to think in layers and sources of corruption. App Logical Layer AD Logical Layer ESE Logical Layer [ESE] Physical Layer Corruptions coming top down through that stack are protected by the schema configuration/constraints of that layer (as joe astutely pointed out). Corruptions coming bottom up, from disk sub-system hardware, are protected by whatever mechanisms those layers have. ---- Dropping back to the above hypothetical as an ESE dev I can say to the AD devs that until they can prove that ESE actually lost thier column, that it's most likely some sort of AD transactional problem, and the source is an AD bug. If I am feeling unbusy I will debug at the AD logical layer, because I know what it's supposed to look like. ---- Coming back to the original issue of replicating _this kind_ of corruption a normal corruption coming bottom up, because the bits we (ESE) sent down the disk subsystem, were not the exact bits we got back later from the sub-systems is almost always detected by the fact that ESE checksums _every byte_ of it's database pages ... and at this point everyone should be very thankful Win2k3 AD isn't on SQL 2000, because it has few such protections, though SQL 2005 finally caught up, 10 years after the fact, it's such a legacy DB, really ... anyway. When the corruption comes up from the bottom, what happens is ESE detects the data is not checksumming, logs an event, and returns a -1018 error (in this case), and starts rejecting DB operations (such as JetSeek() / JetRetrieveColumn()) that involve that corrupt database page. AD then responds to these failed DB ops with can't authenticate a user, AD can't return the results of a search, or AD can't read or apply data during replication (those 3 at least probably being the most common). In short the system starts limping, without affecting the rest of the distributed system. ---- Coming back to jose's worry of old hardware injecting bad data into the distributed system. Fortunately, when the disk subsystem goes bad, ESE does a pretty good job of protecting you, but there are other sources of corruption, besides corruption, an especially insidious one is the bit flip in memory (and yes I see these too) which injects itself in the middle of the above stack. This kind of corruption can both end up making it's way down to the disk subsystem (with a valid ESE checksum), and up and out to the distributed system. >From the perspective of older hardware though, I would _hypothesize_ >that if you're going to have something go bad the disk or the memory over time, keep in mind the disk is the only part of the computer that has a moving part. I would expect disks to go bad first. ---- I would generally not call USN rollback a corruption either, but I think Dean make a fair and quasi-valid point that if you consider the distributed system, yes such a thing is a corruption. Feel free to shim in a "AD Distributed System Logical Layer" in the above stack, between AD Logical Layer and App Logical Layer. I'm waffling on this point though, as somethign smells differnent that other types of corruption. I'm going to think about that for a long time ... in fact Eric yes the ~Eric) is at my door and says he would consider it corruption, so there is a long debate in my future as well ... >From a storage developers perspective, what someone usually calls corruption, is when the data layer they own or lower returns the wrong result. >From a non-storage developers perspective, what someone usually calls corruption, is when the data layer below them returns the wrong result. ---- I'll wax more philosophically on it later .... Cheers, BrettSh On Tue, 6 Dec 2005, Dean Wells wrote: > Great topic and, IMO, great answer ... I've only a few comments in > addition to Joe's reply (inline). > -- > Dean Wells > MSEtechnology > * Email: dwells <mailto:[EMAIL PROTECTED]> @msetechnology.com > <http://msetechnology.com/> http://msetechnology.com > > > > _____ > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of joe > Sent: Tuesday, December 06, 2005 8:56 AM > To: [email protected] > Subject: RE: [ActiveDir] Ntds.dit file corruption > > > I may get into trouble with this post as Brett/Eric/Dean/Steve correct me... > But that will be good. > > [DAW] > I'm fairly certain Bratt will have something to say on this one (in > his shoes, I know I would). > [/DAW] > > I will start with trying to differentiate between types of > corruption... My idea of AD corruption is underlying table corruption. > However some people may consider bad (really unexpected) values in AD > to be corruption. The last isn't corruption, AD is simply a store of > data, it passes no judgement on the data as long as it fits the schema > guidelines for the attribute. If you have the DN of a user in the > siteObject attribute that isn't corruption, it isn't good, but it is > valid for the schema. Or if you have binary data in a unicode string, again, not corruption (a unicode string IS binary data). > That being said, if apps (including parts of AD itself) hit unexpected > data, you will have some issues even if it isn't truly "corruption" it > may as well be in some cases. In fact, table corruption is probably > better than unexpected data in many cases. > > You might be able to argue that a USN rollback is corruption but I > still don't consider it so. Valid data, just out of step. > > [DAW] > That's an interesting one. If you treat the distributed database as a > whole, then USN rollback is indeed a form of corruption even though > each instance may deem itself consistent and intact. > [/DAW] > > Again corruption to me is in the underlying tables. Since AD doesn't > replicate the table structures, you can't pass that table corruption around. > Once AD realizes that some portion of the database is corrupt which > would probably be recognized by ESE saying, "that isn't right" and not > passing info back up to higher levels, but instead passing an error. > > joe > > > > > _____ > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > [EMAIL PROTECTED] > Sent: Tuesday, December 06, 2005 3:49 AM > To: [email protected] > Subject: RE: [ActiveDir] Ntds.dit file corruption > > > Is this guaranteed? How can we/you be sure that the system will > recognise the corruptions and therefore not replicate them? Surely > this is akin to the new feature added to e2k3 sp1, but which is > (sadly) missing from AD(?) > > I must be missing a subtle point - please show me the light :) > > > neil > > _____ > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Steve Linehan > Sent: 05 December 2005 19:26 > To: [email protected] > Subject: RE: [ActiveDir] Ntds.dit file corruption > > > We do not replicate corruption so if you have local corruption as > noted below there is no worry that it would replicate around to other > servers in the environment. > > Thanks, > > -Steve > > _____ > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Phil Renouf > Sent: Monday, December 05, 2005 1:04 PM > To: [email protected] > Subject: Re: [ActiveDir] Ntds.dit file corruption > > > Will Read Only DC's take care of this? I don't know much about them > yet, but it makes sense that if the copy of the dit that a DC has is > RO that it won't try to replicate that anywhere and would only be the > recipient of replication. Anyone with more knowledge about how RO DC's > will work to comment on that? > > Phil > > > On 12/5/05, Medeiros, Jose <[EMAIL PROTECTED]> wrote: > > Well at least the corruption occurred on just a single DC. One thing > that has bugged me about Active Directory is not being able to select > if you want a DC in a remote office to not have the ability to > replicate back in a large enterprise environment. Since most remote > offices only have a few people at the location and a DC is usually > placed for improvised logon and authentication time, many companies > will either use a very low end server or a very old decommissioned one > from their production data center ( Which is probably close to useable > life ). I am always concerned that once the NTDS.DIT file becomes > corrupt it will replicate the corruption to the other DC's in the Forrest. > > Maybe I am just being a worry wort and this really is not an issue. > > > > Sincerely, > Jose Medeiros > ADP | National Account Services > ProBusiness Division | Information Services > 925.737.7967 | 408-449-6621 CELL > > > > > -----Original Message----- > From: [EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> > [mailto:[EMAIL PROTECTED] Behalf Of Susan Bradley, > CPA aka Ebitz - SBS Rocks [MVP] > Sent: Monday, December 05, 2005 8:53 AM > To: [email protected] > Subject: Re: [ActiveDir] Ntds.dit file corruption > > > I did? :-) I think I still said all I know is what the poster said > :-) > > I think I need a course in event log reading because even with the > logs, and the default size of the logs, I still don't see a smoking > gun. The directory services one is filled with events 'post' blow up. > > What is interesting is that it seems to me big server land goes .. oh > yeah... ntds.dit corruption... and sbsland freaks out. Either we do > indeed need to ensure we have a secondary DC or we need to park a > second copy of a system state offsite [say at the vap/var] > > Brett Shirley wrote: > > She replied offline, very likely a single bit flip, tragedy, they > > aren't one release later (Longhorn), where this would've probably > > been non-disruptively handled, logged, and possibly self-healed: > > http://blogs.technet.com/efleis/archive/2005/01.aspx > > > > Anyway, this kind of thing is usually hardware ... > > > > While there are much better disk sub-system testers, one that is > > freely available to any box with Exchange is jetstress. You might > > give that a try. If you can reproduce the event / error with > > jetstress I would not use that box in production. > > > > If you do reproduce the issue several times (several times is key, > > as you want a trend before you start playing the variable game), > > some things you might vary (one at a time): > > > > - Try making sure you have the latest driver and motherboard / > > controller firmware. Then see if you can reproduce. > > > > - Try a different RAID configuration, such as RAID1/RAID1+0 if > > you're on RAID5. > > > > - Try swapping out the hard drives, one at a time. > > > > - Adding the jetstress files to the exclude list in the Anti-Virus > > software. (A low probablility, I've never heard of Anit-Virus > > causing this paticular type of error, and I can't imagine the > > mistake an anti-virus product would have to have to cause this side > > effect) > > > > - If you can reproduce it several times, you could followup with Dell. > > Good luck. > > > > I'm not sure if I answered your question ... > > > > Cheers, > > BrettSh > > > > > > On Sun, 4 Dec 2005, Eric Fleischman wrote: > > > > > >> Going back to the original post, I'm not sure I fully understand > >> the problem yet. Susan, can you define "ntds.dit file corruption" for us? > >> What sort of corruption? What errors/events lead you to believe this? > >> Specifically, I'm interested in errors from NTDS ISAM or ESE if you > >> have any. > >> > >> > >> > >> ________________________________ > >> > >> From: [EMAIL PROTECTED] on behalf of Susan > >> Bradley, CPA > aka Ebitz - SBS Rocks [MVP] > >> Sent: Sat 12/3/2005 10:58 PM > >> To: [email protected] > >> Subject: [ActiveDir] Ntds.dit file corruption > >> > >> > >> > >> SBS box [with Windows 2003 sp1 since September] > >> > >> RE: [ActiveDir] Database Corruption: > >> http://www.mail-archive.com/[email protected]/msg32676.h > >> tml > >> > >> We have a SBS 2003 sp1 box with a corrupt ntds.dit that the > >> Consultant and PSS have been banging on. Could not get the > >> services back running, changed the RPC service to local system and > >> some service came back up [I don't have all the details but the > >> consultant opened a support case of SRX051202605433]. > >> > >> Bottom line they are about going to give up and start a restore but > >> before they do that I'd like to get the view of the AD gods and > >> goddesses around here. From all that I've seen, read, seen in the > >> SBS newsgroup, the corruption of ntds.dit is rare to nil and an > >> underlying cause is hardware issues [raid, disk subsystem]. This > >> doesn't just happen. > >> > >> The VAP asked if not properly excluding the ad databases from the > >> a/v would cause this/trigger this and my expectation is 'no', given > >> that I doubt the majority of us in SBSland properly set up > >> exclusions Virus scanning recommendations on a Windows 2000 or on a > >> Windows Server > >> 2003 domain controller: > >> http://support.microsoft.com/default.aspx?scid=kb;en-us;822158 > >> > >> If this were my hardware and box, I'd be putting this sucker on the > >> operating table and getting an autopsy before putting it back online. > >> > >> Are we right in being paranoid now about this hardware? For you > >> guys in big server land you'd just slide over another box into that server role. > >> > >> --------------------------------------- > >> Stupid question alert.... > >> > >> Okay so we know that having a secondary/additional domain > >> controller is a good thing even in SBSland...but question.... many > >> times the second server in SBSland is a terminal server box because > >> we do not support TS in app mode on our PDCs. So we've established > >> that having a domain controller and a terminal server is a security > >> issue [see Windows Security resource kit, NIST Terminal services > >> hardening guide, etc etc....] If our second server is a member > >> server handing out TS externally, should that be a candidate for > >> the additional DC? Are the issues of TS on a DC ... true for 'any' > >> DC? Would it be better than to Vserver/VPC a Win2k3 inside a > >> workstation in the network if a third server box was not feasible? > >> > >> List info : http://www.activedir.org/List.aspx > <http://www.activedir.org/List.aspx> > >> List FAQ : http://www.activedir.org/ListFAQ.aspx > >> List archive: > >> http://www.mail-archive.com/activedir%40mail.activedir.org/ > <http://www.mail-archive.com/activedir%40mail.activedir.org/> > >> > >> > >> > >> > > > > List info : http://www.activedir.org/List.aspx > > List FAQ : http://www.activedir.org/ListFAQ.aspx > > List archive: > > http://www.mail-archive.com/activedir%40mail.activedir.org/ > > > > > > -- > Letting your vendors set your risk analysis these days? > http://www.threatcode.com > > List info : http://www.activedir.org/List.aspx > <http://www.activedir.org/List.aspx> > List FAQ : http://www.activedir.org/ListFAQ.aspx > List archive: > http://www.mail-archive.com/activedir%40mail.activedir.org/ > <http://www.mail-archive.com/activedir%40mail.activedir.org/> > > > > > List info : http://www.activedir.org/List.aspx > List FAQ : http://www.activedir.org/ListFAQ.aspx > <http://www.activedir.org/ListFAQ.aspx> > List archive: > http://www.mail-archive.com/activedir%40mail.activedir.org/ > > > > PLEASE READ: The information contained in this email is confidential > and intended for the named recipient(s) only. If you are not an > intended recipient of this email please notify the sender immediately > and delete your > > copy from your system. You must not copy, distribute or take any > further action in reliance on it. Email is not a secure method of > communication and Nomura International plc ('NIplc') will not, to the > extent permitted by law, > > accept responsibility or liability for (a) the accuracy or > completeness of, or (b) the presence of any virus, worm or similar > malicious or disabling code in, this message or any attachment(s) to > it. If verification of this email is sought then please request a hard > copy. Unless otherwise stated this email: (1) is not, and should not > be treated or relied upon as, investment research; (2) contains views > or opinions that are solely those of > > the author and do not necessarily represent those of NIplc; (3) is > intended for informational purposes only and is not a recommendation, > solicitation or > > offer to buy or sell securities or related financial instruments. > NIplc does not provide investment services to private customers. > Authorised and regulated by the Financial Services Authority. > Registered in England no. 1550505 VAT No. 447 2492 35. Registered > Office: 1 St Martin's-le-Grand, London, EC1A 4NP. A member of the Nomura group of companies. > List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/ List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
