RE: [ActiveDir] best practice?

Brett Shirley Thu, 05 May 2005 11:19:54 -0700

I don't really have serious time to answer this right now ...  so for now,
you're going to have to trust me, it's not just a little bad you can
recover from it with X, it is _really_ bad to do an image based restore,
and hard to restore normality afterwards ...


I'll prop a portion of a slide deck later on, where I show to the backup
vendors how the inconsistency is introduced ... but I don't know if it
will make sense w/o my delivery.  It is also a bit simplified.  joe is
close below, some comments inline, in joe's mail, as it's the closest so
far to understanding why this is bad ...

BTW, clean and dirty AD DB have _nothing_ to do with this.  clean/dirty is
an ESE / JET Blue level concept, this is an entirely AD Logical issue.
Nothing prevents an ESE database from being imaged.  The AD has a design
decision that prevents image based restores.

I don't play XBox or any computer games really.  I know that sounds weird,
that a computer geek would not play video games, but I met a girl at a
party the other day who is a huge FPS player, so I think the world somehow
balances out in that respect.  How could that compare to the relaxing
sense of accomplishment of working out paticularly cunning methods of
compressing replication metadata ... I mean really?  Same goes for hair
maintanence tasks.

On Thu, 5 May 2005, joe wrote:

> I am actually waiting for Brett or ~Eric to respond to your post as well. I
> am positive they could give you a bulleted list of things that you as well
> as the rest of us are completely unaware of that will go pear shaped both
> because they have seen things like that or just know it from familiarity
> with the code paths involved. 
> 
> AD will not do a complete reload of the DB on its own, that was an NT4 thing
> that occurred if the change log rolled. All gone now.
> 
> Do some searching on DSA IDs/GUIDs and Invocation IDs/GUIDS. A DSA ID is the
> GUID for the DC itself[1], it doesn't change for the life of the DC from my
> understanding. The invocation GUID[2] changes on restores, again to flag,
> hey new DB,

[BrettSh] It's not a new DB so much, as a new logical stream of changes to
the distributed system ...

>  ... you don't know what my state is, so it can be brought into a
> consistent state.

[BrettSh] Don't like the term "consistent state" here.  I also don't like
how we're talking about the DB ... I know all the AD repl docs, talked
about it as a new database GUID, but that was poor taste ... there is a
subtle but key difference between

        [local] database consistency, and 
        distributed system consistency.

It's the later we're worried about.  +The later requires multiple nodes /
DCs to have followed all the rules.+  Most of the rules are coded into the
way AD behaves, when possible.  Thou shalt not image restore, is
unfortunately not coded, and hard to be defensible against ... well,
without sacraficing availability ... but lets not get into that trade-off
right now.

> You should find hits on invocation id with topics of
> replication consistency, usn polling, AD restores, etc as it is key to all
> of them though it has been awhile since I went searching for that stuff.
> Something I have read on a couple of occasions but can't say I agree with is
> that allegedly the DSA ID and invocation id are identical unless a restore
> has occurred. I don't think I have EVER seen them identical so I don't know
> where that info came from. I am noting it simply because I recall seeing
> documentation to that effect in the past. 

[BrettSh] They should've been the same until the first restore ... there
is a bug somewhere, that no one bothered to iron out.

BTW, we also change the InvocationID when we _re_-host an Application
Directory Partition ... I'll leave the discussion of why to your
imagination.  

Oh and since IFM is like throwing AD Restore and dcpromo into a blender
for 30 seconds, IFM based dcpromo sort of changes the InvocationID.  
You'll notice the invocationID of the DC you took the original backup from
in the retired DSA signature of the newly dcpromo'd DC.

> 
> Really try to find detailed info on how replication works. High USN is just
> the tip of the iceberg, there is a lot of underlying details but I
> understand where the misconceptions can come in, a lot of the documentation
> out there in the public realm simplifies the crap out of this stuff with
> analogies and very high level details without ever indicating that it is
> really quite more involved than that. This can burn you when you start
> making decisions based on those simplified examples. 
> 
> If you really want to get into it, start fishing through the platform sdk
> Ds* API calls. I would especially recommend the DsGetDCInfo/DsGetDcInfo2
> functions and out of those the ones concerning DS_REPL_NEIGHBOR structures
> which gives a feeling of how much info there is involved with replication
> and consistency.  
> 
> While it may be possible to force the invocationid to change after the image
> restore, I am not aware of a method other than doing a proper DB restore. It
> could be as simple as tapping that attribute in the nTDSDSA object but I
> certainly would NOT be willing to test that in production even if it worked
> great in the lab. 

[BrettSh] 

        Plausible Proposal #1: (please see big warning below)
        _Technically_, yes if you trigger an Invocation ID change after
        you lay down the image, _AND THIS IS THE KEY_ ... before the DC
        talks to any other DCs, and takes any new changes to the database.  

                This is one of those rules that all the nodes must follow,
                and if you use an AD based backup/restore program, the
                appropriate logic will be triggered, and the rules for
                distributed consistency upheld.

        _Even_ booting the DC, may institute a change, that causes
        distributed system inconsistency.  Obviously, tapping the object
        from LDAP is not an option, you have to do it from DSRM.
        Unfortunately, I've forgotten to tell you how you can trigger a
        invocation ID change from DSRM ...

In short don't go there. These are not the droids you're looking for.

> 
> Certainly, do not image DCs and use that as a recovery mechanism. The one
> way to do that, IMO, would involve snap shooting and rolling back all DCs in
> a forest at the same time. I don't see how this could effectively be done in
> the real world on real hardware. I visualize possibilities with
> virtualization software, but that would require a lot of testing and work to
> get there and some how guarantee that the snapshot was done at the exact
> time for all images. 

[BrettSh] 

        Plausible Proposal #2: (please see the big warning below)
        _Technically_, this will work too.  Requires all DCs to be off at
        the same time when you take the image based back ups (I  
        think).  Requires all the existing DCs to be turned off before
        you restart the first restored image.  I think that is all that
        is required ... but I'm not sure ... I don't care enough to try
        to give anyone 

        Plausible Proposal #3: (please see the big warning below)
        Of course a single DC forest can be image based restored as well,
        though ... you're more likely to get SIDs reissued, and have old
        wacky ACLs in this case, b/c IIRC we invalidate the present RID
        pool on restore.  This can be mitigated by booting the DC, and
        before creating any security principals, booting the next rid up,
        can't remember how that is done off the top of my head though ...

> 
> If you have done this in production already, I would recommend going back to
> what Brett said and doing a verification of your DB on all of your DCs.

[BrettSh] Jeez, I really hope no one is in this state, it can be quite
disturbing to iron out.

> Again, Brett is someone who knows about the AD DB. Don't let his sometimes
> grouchy demeanor throw you off. He may get difficult at times but he is
> almost always trying to help, he just has interesting ways of expressing it
> on occasion. He has actually been extremely nice on this list compared to
> some other notes I have seen from him.

[BrettSh] I thought I was being nice ... wow, it's going to suck, when
someone actually annoys me. ;)

>  Basically I say the same about him
> that I have often said about myself; don't mistake the quality of the
> delivery for the quality of the information. :o)

[BrettSh]

So first let me divulge, that I am not in fact the Garage Door Operator
for building 7, in fact I am a developer/programmer (we're call Software
Development Engineers at Microsoft) in Windows, ON Active Directory.  
Before my recent move to the ESE development, I worked on the AD
Replication development for ~5.5 years, spending time working on AD
Replication, AD backup/restore, a small bit in AD Schema/Database stuff,
AD tools, and even dabbling in DcPromo off and on when required for those
years.  Quite frankly I'm the one who has dealt with almost all the areas
affected by a bad image based backup/restore, and the parts that make a
good backup/restore possible.  I'm uniquely qualified to say:
        Image based backup/restores are not supported for AD.

So we had this customer who wanted to use SAN based hot split on Win2k AD
(which is even more unsupported, as they didn't shutdown all the DCs, like
Plausible Proposal #2 above), after explaining that they'd have to
shutdown all DCs, and them agreeing (though I doubted they'd actually do
that, it's amazing what customers will do when they think they understand
better than you) and then they agreed for restore, they'd take ALL the DCs
back to the same backup time, at the same time, and working out this
complicated set of steps they would need, I pointed out this:

--- begin quote ---
I can't confirm if you will fail ..., but that set of steps if correctly
followed will not cause forest corruption due to USN rollback.  Honestly,
it isn.t worrying about this once PSS guided transition that worries me,
following those types of steps once isn't hard . it is someone not
understanding why each of the parts of the technique were required, and
later trying it again, and not getting it right.  In general customers may
not truly understand the system's requirements, EVEN after they say they
do (b/c they believe they do, no one intentionally hoses their domain, but
somehow it happens) so it's just easier to say "no mirror splits on
unsupported SANs"
--- end quote ---

So ....

 Warning!  Warning!  Danger Will Robinson!  Danger!

So the same goes for all 3 proposals above ... while technically you could
work out the exact set of steps required, it is likely to be an error
prone manual process ... will the next guy who maintains the corp
infrastructure understand it all ... will you miss a step ... if you have
lots of DCs in branches, how do you know one won't be missed ... you're
playing with fire ... and the slightest tweaks can change the answer
substantially, for instance auth restore for proposal #1 must be done
after triggering the invocation ID to change, which would require a reboot
... even me with all my knowledge, wouldn't implement such a mechanism in
a live corporate deployment ... it's subtle, and it is not worth the risk.

Friends don't let friends use image based backups of AD.

Cheers,
-Brett [msft]
I'm just kidding, I just made all the above up, I really am just the
Building 7 Garage Door Operator ...


> 
>    joe
> 
> 
> 
> [1] It is the objectGUID attribute of the ntdsdsa object(aka NTDS Settings
> object). 
> [2] It is the invocationID attribute of the ntdsdsa object.
> 
>  
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Bahta Nathaniel V
> Contr NASIC/SCNA
> Sent: Thursday, May 05, 2005 10:22 AM
> To: [email protected]
> Subject: RE: [ActiveDir] best practice?
> 
> Joe,
> 
> I appreciate you indulging me in detail.  I was just curious on what the
> consequences may be of imaging and restoring DC's.  We are always evaluating
> and re-evaluating DR methods and techniques, and this was the latest hot
> topic.  I thought AD pushed changes up to a pre-determined amount and then
> it would just replicate the whole database if the number of changes were too
> great.  I am not sure of the in-depth implications of restoring imaged DC's
> but I know the difference between a clean and dirty AD DB and it sounds as
> though the metadata cleanup and synchronization is not meant to happen with
> an AD unaware application such as ghost.  Perhaps an application that could
> stabilize an old DC with the new AD DB would be something that would have to
> be looked at.  Or maybe an image of a member server and a dcpromo is the
> easiest way to recover a DC.  I have intentions on working smarter, not
> harder, but that does not forgo my lust for understanding right from wrong.
> 
> Thanks again for the rebuttal.  It always helps to hear things from all
> perspectives to get a better look at the big picture.
> 
> Nathaniel
> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of joe
> Sent: Wednesday, May 04, 2005 2:36 PM
> To: [email protected]
> Subject: RE: [ActiveDir] best practice?
> 
> I'm not Brett[1] but wanted to just say something really quick here. 
> 
> Well a couple of things actually.
> 
> 1. When it comes to AD Database consistency and replication. Brett is
> someone I would tend to listen to very carefully. I may not understand what
> he is trying to say but I will try like heck to understand it. Rough around
> the edges though he may be, he knows a lot about the guts of the AD DB and
> Replication. Keep in mind he wrote some of the most "brilliant" parts of
> repadmin[2]. 
> 
> 2. When you image and recover the image you are bypassing any and all logic
> associated with a directory DB recovery. I.E. You aren't restoring the
> database through the very specific DS Backup/Restore API so you don't get
> the cool things that it does like renaming the Database GUID aka invocation
> ID which effectively tells all of the other partners there is a "different"
> database out here that needs to be fully updated. 
> 
> I haven't fully thought out the implications of that but one thing right off
> the bat is the thought that all DCs maintain high water vectors for all
> databases so they know where they are at for replication. This isn't just
> kept on the DC in question, this is kept all over so I could see serious
> possibilities of issues there. Additionally think of a change that mastered
> on that database and replicated out. How do you get it back if the DB is
> rolled back and all of the other DCs already think that DB has that info
> since it was mastered there?
> 
> You get ~Eric, Dean, and Brett thinking about it and I expect you could find
> all sorts of horrible things that this can do to you. 
> 
> I think the idea that a DC can be restored from an image like that because
> it is "sort" of like restoring the DB is flawed at the very best. You don't
> have a full comprehension of what is being done in the backend to support
> that restore. If it were that simple, why do you need a backup api at all?
> Mirror the DIT and zip it and there is your backup... It doesn't work that
> way.
> 
> As Brett indicated... Bad mojo... Heck I will go further, positively evil.
> You could damage your AD in ways that you (and it) has no clue about and
> only later run into it when you are trying to figure out niggling
> consistency issues in applications that act odd some of the time. 
> 
> 
>    joe
> 
> 
> 
> [1] And I couldn't play him on TV either, Brett stores a good portion of his
> height in his hair and I store mine in my legs. 
> 
> [2] His words when I met him in person at an MVP summit. He was quite
> excited to talk about that portion of the code...
> 
> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Bahta Nathaniel V
> Contr NASIC/SCNA
> Sent: Wednesday, May 04, 2005 1:59 PM
> To: [email protected]
> Subject: RE: [ActiveDir] best practice?
> 
> Brett,
> 
> What is your basis for not being able to restore a DC from a image?  If the
> DC has an old copy of the directory data, it will check its USN's and update
> its copy.  What could cause havok if anything?  We are about to institute
> this very same concept here to turn DR into a 10 minute process when it
> comes to operating system recovery.  We will image the servers monthly and
> restore from said image whenever one crashes.  What could cause a problem by
> restoring a DC, it will be timestamped to be old and AD will synchronize it
> with the rest of the domain.  
> 
> Please elaborate on your basis for comment.
> 
> Nathaniel Bahta
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Brett Shirley
> Sent: Wednesday, May 04, 2005 11:47 AM
> To: [email protected]
> Subject: RE: [ActiveDir] best practice?
> 
> jlc,
> 
> You can't restore a single DC via an image based backup, either.  It is not
> supported, it is not allowed ... it is bad mojo.
> 
> Well, it wouldn't cause issues if the forest had ONLY that one DC (seems
> unlikely the case), or for a multi-DC forest, you'd have to shutdown all the
> DCs in the forest at the same time, when you took your backup images.  
> And then on restore, restore them all at the same time.  Basically a pretty
> infeasible suggestion.
> 
> Cheers,
> -Brett Shirley [msft]
> 
> This posting is provided "AS IS" with no warranties, and confers no rights. 
> 
> 
> On Wed, 4 May 2005, Joseph L. Casale wrote:
> 
> > Exactly, I do it for DR purposes, the old one dies - I reimage it and 
> > put it back out there.
> > No poblem...
> > jlc
> > 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Phil Renouf
> > Sent: Wednesday, May 04, 2005 7:01 AM
> > To: [email protected]
> > Subject: Re: [ActiveDir] best practice?
> > 
> > On 5/4/05, John Shukovsky Jr <[EMAIL PROTECTED]> wrote:
> > > BUT....as for DC's. I do "image" dc's using Symantec Livestate 
> > > Recovery ( formerly PowerQuest V2i ). It works wonderfully. I 
> > > primarily use for backups. I have not had to recover a server in 
> > > production ( and hope I do not have to ) but I have in lab 10+ times
> > and servers are as clean as ever.
> > > You should take a look.
> > 
> > When Brett mentioned imaging DCs being a bad idea and to never ever do 
> > it I believe that he was meaning don't Image a DC and try to use that 
> > Image to build other new DCs and just trying to change the SID like 
> > you would for a desktop. Bad idea!
> > 
> > Phil
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive:
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive: 
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> > 
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
> 

List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

RE: [ActiveDir] best practice?

Reply via email to