RE: [ActiveDir] AD DR - replication lag site----Why?

Grillenmeier, Guido Mon, 23 May 2005 10:21:22 -0700

oh, gee, I'm too late - but I had a great weekend ;-))

I'd have to say (and all the posts show themselves) that there is no single 
right or wrong answers to lag sites.  It's one building block to mastering AD 
DR and may very well apply more for larger companies than for smaller ones 
(it's tougher to restore a multi-gig DB than it is to restore a few hundred 
megs, prior to perform an auth. restore).  I've been using and implementing 
them successfully but am not recommending them for everyone.  And we're also 
using them at HP and have been quite happy with them (you do recover stuff 
easily, which you would otherwise simply not bother to recover...)  And I also 
like how other 3rd party tools handle recovery - but those are also not 
applicable for all customers.


Great thread - it's a good overview about the vast range of differnt oppinions 
on such a "fairly exotic" topic.

Cheers,
Guido

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of mike kline
Sent: Montag, 23. Mai 2005 13:38
To: [email protected]
Subject: Re: [ActiveDir] AD DR - replication lag site----Why?

haha, ok so you MVPs also have these special powers.  

Very good thread and thanks to all.   This is a subject I didn't know
much about until this thread came along.  Thanks to Todd, Joe, Jorge
and everyone else that contributed.



On 5/23/05, Myrick, Todd (NIH/CC/DNA) <[EMAIL PROTECTED]> wrote:
> Using the powers of the MVP, I now officially pronounce this thread as
> complete :)
> 
> Todd
> 
> -----Original Message-----
> From: Jorge de Almeida Pinto [mailto:[EMAIL PROTECTED]
> Sent: Sunday, May 22, 2005 4:12 PM
> To: 'joe '; '[EMAIL PROTECTED] ';
> '[email protected] '
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> 
> (1)
> In the Netherlands when you own a car and you drive with it, by law you at
> least must have a basic insurance that covers liability. This simply means
> that when you cause damage with your car the other party gets paid to repair
> their damage. You, however, have to pay for your own damage.
> This is for those cases when most people cannot affort such high costs. This
> also applies to situations in IT. You'll always have to answer: how much can
> you affort when this or that occurs?
> 
> What I'm trying to say here is: you implement a certain solution to
> accomodate those situations (in solving) when errors have been made by
> persons. People simply make mistakes and I agree it is better to prevent
> errors against repairing them. But again it is not always possible, because
> it costs too much money, or they simply don't care or they are not aware of
> the damage that can come from it, etc.. How many times have you heard "it
> will not happen to me, only to others"
> There are also a lot of people that only understand (or get interested in)
> what you are trying to say/explain when they are in deep sh*t. But then...
> it's too late and much more expensive solutions/activities must be used if
> sometimes a solution exists for the occasion.
> It is always the choice between: "trying to save money now and spend a crap
> load later each time it happens" or "spend a little bit of money now and
> spend less money later". I believe in spending money now to save later
> (long-term thought). A lot of managers only think about spending as little
> money as possible. Eventual problems in the future are not problems at the
> moment (short term thought)
> 
> (2)
> >> When I say rollback, there is nothing left of the forest to get a USN
> rollback and no worries of TLS.
> 
> I understand that an old state of the virtual environment is only used when
> ALL other DCs are down, gone, bye bye, etc. (or at least isolated and some
> of the activities mentioned in the MS DR WP are done like resetting all
> kinds of accounts/trusts)
> When I think about it now, you are right, I made a mistake, sorry for that.
> Yes, when "giving life" to virtual DCs that belong to the same old state of
> the virtual environment all those virtual DCs know about the state of the
> other DCs in the virtual environment.
> I do believe this solution provides a very fast recovery of the first DCs in
> the forest to be rebuild.
> With this solution and using the native way (restoring a backup) when the
> tombstone lifetime has passed (the virtual environment state of the backups
> used) you will experience event 2042 on all DCs (Event ID 2042: It has been
> too long since this machine replicated)
> 
> After bringing the initial environment up you need to execute the steps as
> mentioned in the MS DR white paper.
> You especially need to think if everything really is down or the corrupt
> forest is still up to provide functionalities for the users while restoring
> a new forest in parallel.
> And yes, there are a lot of decisions to be made and each IS different for
> each company
> 
> Cheers
> #JORGE#
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> To: [email protected]
> Sent: 5/22/2005 7:05 PM
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> 
> 1.
> 
> > I assume almost everyone has an insurance policy for their house if it
> burns down.
> 
> In the US, you can't get a mortgage unless you get insurance. Ditto
> cars,
> loans require full coverage and local law enforcement requires some
> minimal
> level of insurance, you have no choice in the matter. A lot of people
> buy
> insurance not because they want it or would get it themselves, but
> because
> some requirement forces them to. Buying insurance is like enforced
> gambling,
> you are forced to pay to gamble that your house will burn down or you
> will
> crash your car. Insurance companies who set the prices and hence the
> profit
> are gambling you won't have an issue and have weighted the payoff
> accordingly. If you ever take advantage of that insurance, you are
> pretty
> much guaranteed to see an increase in your rates at some future point
> for
> taking advantage. Of course you are also quite likely to get an increase
> when Dean's house gets whacked by a hurricane as well.
> 
> Insurance on cars and houses is not an optimal example. Maybe one that
> is
> closer would be the optional insurance you can get for car malfunctions
> or
> electronics or even MS Software... AKA service contracts. Mr. Jones you
> should protect this TV because it could possibly fail in the next year
> and
> you don't want that expense.... Mr. Jones, you should protect this MS
> environment because you may have something fail in the next year and you
> don't want that expense..... Of course, the first thing people start to
> wonder at that point when hearing those pitches is why are there
> warranties
> at all... Again, it is gambling, only, unlike the insurance stuff
> mentioned
> above, you have a realistic choice with several options.
> 
> > What is that better answer in your opinion?
> 
> The better answer is to understand why this needs to be done and explain
> how
> you can get away from it. I have lived the "Fortune 5 argue until you
> are
> blue in the face about giving out too many permissions" life. When I got
> blue in the face and people didn't listen, I did it some more. It wasn't
> like I walked in and said some stuff and everyone said, well cheerio,
> great
> idea, let's do it. I had to go through all of the issues that were being
> experienced and the goals the company had and show how locking the
> environment down would help all of it. It was a slow and extremely
> painful
> experience over the years. You can ask Deji, I literally had people who
> would have tried to kick my ass had they gotten me alone as I started to
> win
> the arguments.
> 
> Again, technology can not save you from bad policy. It just doesn't have
> the
> power to do it. You have to, as an admin, as a consultant, as someone
> there
> to tell people what is right, have the chutzpah to point out what is
> stupid
> and say, this makes no sense - it is contraindicated by your goals.
> Without
> deviation from norm, there can be no progress. If I ask someone why they
> do
> something and they say that is the way it is always done, that means the
> something is suspect and needs to be reviewed. If I hear myself saying
> it, I
> review it.
> 
> Political issues are something I understand. You want politics, step
> into a
> Fortune 5 company. You would swear you were sitting in a House of
> Congress.
> You have to find the political hotbuttons and push them or point out
> different buttons to people who didn't realize they had a button in the
> first place. It is very dynamic, one solution will not work everywhere,
> but
> it is the job of any admin or consultant who cares to do their best to
> point
> this out and correct it if they are involved. I have done this in so
> many
> ways for so many things over the years I don't recall most of it. I do
> recall that I was always hated for pointing it out but in the end when
> it
> was better, people respected me and liked me even more because things
> were
> overall better.
> 
> It is very likely the reason I could do this and not worry is that I
> have
> never had a job I was so overally attached to that I wouldn't walk away
> from
> or risk being fired. I expect that may be rather unique but I am pretty
> confident in my ability to find a new job, even if it has to be washing
> dishes. It has nothing to do with me needing to be right, it is all
> about
> doing the right thing. Bowing to political pressure almost never is the
> right thing. If you cave in to it and things go wrong later, politicians
> are
> good for pointing at you and saying, he/she went along with it and that
> is
> our technical information source...
> 
> In the meanwhile, you, if you are the admin running a dangerous
> environment,
> protect yourself. If you choose a lag site, so be it. If you know how to
> set
> it up properly and use it and want to be in the world of auth restores
> of
> objects, knock yourself out. I would choose the option of populating the
> data to a protected store and using undelete or using x08, a protected
> store, and undelete. Overall I think it is an easier, faster, and
> cheaper
> solution. If I was in an environment where undelete wasn't available, I
> would fight like hell to get it there or take away the rights from
> people to
> stick a hot poker up my butt and deal with the political consequences.
> I've
> done that. I would rather spend 80 hours a week doing other people's
> jobs
> than 40 hours a week trying to put crap back together that some bonehead
> who
> shouldn't have the ability screwed up.
> 
> > In the end there is a big difference between "being right" and
> "getting
> it"!
> 
> Agreed. But because it is hard or someone said it isn't going to happen
> isn't a reason to not pursue it. Again I have been told so many times
> something would never happen and turn around and work it and work it and
> work it and see it happen a few months or years later. You just have to
> be
> willing to look at what is wrong, point out the issues, and then every
> time
> something happens that the solution you had would have helped, point out
> again why that solution needs to be implemented. I have no problem
> looking
> an Exec or Director or Manager or anyone else in the eye and saying,
> "this
> wouldn't have been an issue if we were doing this right" or even saying,
> "this is what I told you about before, do you understand what I was
> saying
> now?". Your manager doesn't listen, escalate.
> 
> 
> In summary, if you do silly things, you will be bitten. You can try and
> put
> things in place to make it so you bleed less but you still end up with a
> bite and probably an ugly scar. You certainly spend a lot of time
> putting on
> antibacterial healing salves that could be better spent doing something
> else.
> 
> I have something that I tend to say a lot to admins. Don't get into a
> situation where you are so busy chopping down trees that you can't
> sharpen
> your axe. It is a losing position. If you find yourself there, you must
> utilize your own time and resources to make time to sharpen your axe or
> else
> you are doomed to chopping down a forest you can never get to the end of
> because you get slower and slower as your axe grinds down to a nub and
> then
> you can't acoomplish anything. That isn't a fun position to be in. Work
> isn't supposed to be something you hate, you shouldn't actually dislike
> most
> of the week. If you do, you are doing something wrong or you are in the
> wrong job.
> 
> 2.
> 
> > However I do think you have to be carefull with this because of
> > USN rollback, tombstone lifetime and replication and maybe some
> > other stuff as the DCs are (I think) not recovered using the native MS
> way.
> 
> When I say rollback, there is nothing left of the forest to get a USN
> rollback and no worries of TLS. I am talking about every DC is just
> gone,
> shut off, fdisked, etc. The rolled back environment comes up and is the
> entire environment. Even if a DC is up that isn't part of that, it is
> wiped
> from the seed environment and gone from name resolution. The issues you
> could have will be SID issues on resource ACLs and reverted passwords on
> any
> objects that have passwords. However, if you got to this point, that is
> called acceptable losses and you clean it up after you have the business
> functioning again.
> 
> As an example, we had this exact plan for every schema update deployment
> and
> any seriously major update, the world has ended scenario. Something that
> wasn't found in Test or QA that knocks a global forest down for the
> count.
> We had a lifeboat with a DC or two from every domain that was off the
> main
> network. Everyone was ready to power down their local DCs and insert
> rebuild
> CDs in based on a call from our group. The only unknown was when to call
> for
> that point to start.
> 
> That point is variable and has to be, unless you know exactly how long
> you
> can be without the environment as a whole and how long it will take to
> rebuild to an acceptable level you will be making an ad hoc judgement of
> when to go whole hog recovery. If you know it takes 5 hours to get to an
> acceptable level and you know you can't be fully down for more than 12
> hours, you know that if you have a complete down that at the 7 hour mark
> your decision is made for you.
> 
> When to make the decision depends on what your recovery plan is, how
> long it
> takes to execute, what is dependent on you and what the priority is of
> those
> things that are dependent on you are for getting up, and many other
> things.
> If you have a full forest failure, you don't need every DC of the forest
> back up and running immediately unless you only have one DC per domain,
> and
> if that is the case, you tend to have a fairly simple recovery process.
> More
> than likely you can have a pool of machines at a DataCenter hub that can
> get
> brought up to service the environment in a minimal fashion while you get
> more up. So what is that minimal fashion? Do you know for your company?
> It
> will be different for every single company. What applications or
> services
> must be available in 2 hours, in 4 hours, in 8 hours, etc. Are your
> plans
> structured for that? Do you have enough money and resources to
> accomplish
> it?
> 
> You will of course try to troubleshoot the issue. At some point you
> could
> make the decision or be forced to make the decision to recover from
> scratch.
> Have you thought about the situations that could cause that and do you
> have
> an inkling of a plan to do it? Do you know the prioritization globally
> of
> apps and services that depend on your resources? Is there one? In many
> or
> probably even most, no, everyone assumes they are #1.
> 
> In the last place I was at where it was my thing to think about that
> stuff,
> the priorities were basically this, get something up this is the seed,
> it is
> enough to get other DCs starting to be built. Once that next set of DCs
> is
> built, that services the DataCenters for Data Center specific stuff,
> then
> you get more DCs up to handle some small amount of WAN traffic coming
> in.
> Then you get more and more capacity for WAN traffic. In the end, you
> work on
> getting the WAN sites up. The seed should be up in running in less than
> an
> hour. IMO, done properly, it should only take as long as it takes to
> copy
> the proper virtual disk images to a machine that can run them. Then you
> start dumping IFM images and spinning up real DCs to handle Data Center
> Apps
> (they are at the data center because they are important right?). This
> second
> stage can vary but shouldn't be any longer than a couple of hours, how
> long
> does it take you to build a server from the ground up? Do you have an
> automated load process? If not why not? Once built IMF and now you have
> your
> initial data center structure. Take a breath, build up some extra
> capacity
> DCs for the datacenter. If you have enough resources, you did this when
> building the initial real DataCenter DCs. Getting Data Centers up and
> running enough to handle local and WAN requests should be something that
> takes hours, not days. Days, weeks, months is the time frame for WAN
> site
> DCs. Doesn't mean the WAN sites aren't working, they just aren't working
> in
> an optimal state. If you are doing cool virtual stuff where it isn't
> just
> your seed using the virtual environment, you could have your environment
> up
> very quickly. Watch Dean roll back 20,30 machines in a few seconds some
> day
> and you get an idea of what can be done with virtualization.
> 
> 
> 3.
> 
> My thoughts on this are always to simply rebuild. A DC shouldn't be
> unique
> enough to require a restore from backup. As Rick mentioned previously,
> they
> are little tin soldiers. Knock them down as you need to. IMO, a properly
> configured environment can have just about any DC knocked down at nearly
> any
> time without many adverse affects due to failover and fault tolerance.
> There
> are examples where this isn't the case such as apps that point at a
> specific
> DC. If you have any encounter with apps like this, beat them until they
> fix
> it. If it is a synced type app, you will be hard pressed to get it
> changed
> (like say the RUS). If it is simply an app using AD for auth or data
> retrieval, there is no excuse to letting them hard point to one machine.
> I
> have been in many fights with integrators and developers of Unix/Java
> based
> apps that only allow them to select one machine. My response to them is
> find
> a better way or else they WILL break. I will not pony up additional
> responsibility because their app doesn't work properly.
> 
> 
>   joe
> 
> 
> 
> -----Original Message-----
> From: Jorge de Almeida Pinto
> [mailto:[EMAIL PROTECTED]
> Sent: Sunday, May 22, 2005 8:41 AM
> To: 'joe '; '[EMAIL PROTECTED] ';
> '[email protected] '
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> 
> Hi,
> 
> In my opinion the following recovery situations exist when it comes to
> AD:
> (1) Accidental object deletions
> (2) Your forest/domain drops dead
> (3) A DC drops dead
> 
> (1) Accidental object deletions
> I agree with Joe that people should only have those permissions needed
> to do
> their work and this should be configured accordingly. I also agree that
> not
> too many people should have domain/enterprise admin permissions. However
> in
> the real world this is not always possible because of lot of reasons
> (history , politics, etc.) Organizations are not 100% perfect, that's a
> fact
> also. Looking at the future and preparing for the worst, solutions are
> implemented to mitigate those risks. Costs are made in advance to save
> time
> and money in the future. It's somehow like an insurance policy. When
> something goes wrong I have something to fall back on. I assume almost
> everyone has an insurance policy for their house if it burns down. How
> many
> times will you use that insurance policy in your lifetime? Never if
> you're
> lucky... once if you're in bad luck... twice if you're in really bad
> luck!
> In the case of accidental objects deletions customers need/want a
> solution!
> What is that solution? Is it a lag site, is it a tool like Quest
> Recovery
> Manager, is it a tool like Guido's tool, is it something else I/we still
> don't know about? It all depends on the functionality needed by the
> customer
> and the cost to implement and maintain the tool/solution.
> 
> In my opinion a LAG is one of those solutions for accidental object
> deletions, as always and only when implemented correctly.
> 
> Joe (and others), you don't recommend setting up lag sites as there
> could be
> a better answer. What is that better answer in your opinion? What would
> you
> do if a customer said to you: "I want to have ADMIN rights and I want to
> be
> able to delete objects in my forest/domain and I want you to provide a
> solution for me if I delete the wrong objects" (The answer: "take away
> admin
> rights is not an option" ;-)) ) What is your solution for accidental
> object
> deletions? That is what I'm interested in.
> 
> In the end there is a big difference between "being right" and "getting
> it"!
> 
> (2) Your forest drops dead
> I don't think LAG sites are a solution when your forest drops dead,
> especially in a large environment. What's the primary goal to acchieve
> when
> your forest drops dead (and what's the second?)? (please give me
> answers..)
> When the forest drops dead, nobody can do anything anymore.
> In my opinion the first goal to acchieve is to get everything up and
> running
> as fast as possible and provide for the max. of functionality as
> possible to
> the end users. In my opinion the second goal is to repair the health of
> the
> forest and if it is really screwed rebuild it. So for this you need a
> procedure that accomodates those situations.
> I always hear everyone talk about a forest recovery as in rebuilding the
> forest from scratch. Rebuilding a forest because it dropped dead should
> be
> (again in my opinion) the last step ever taken because this means you're
> going back in time and therefore you will loose info. I believe that
> there
> exists more between a healthy forest and a forest that needs to be
> rebuild.
> Do you guys agree?
> 
> As for the "virtualized environment that can be rolled back to any point
> in
> time" I think that can be part of a solution to start rebuilding a
> forest.
> However I do think you have to be carefull with this because of USN
> rollback, tombstone lifetime and replication and maybe some other stuff
> as
> the DCs are (I think) not recovered using the native MS way. At DEC I
> heart
> Dean and Joe and some other guys talk about this method. Unfortunately I
> did
> not hear the complete story behind this and to be honest I have not put
> any
> time to it to think about it and how it may work as a quick start for a
> forest rebuild
> 
> (3) A DC drops dead
> We all know this one.
> Restore the DC from a backup or do a metadata cleanup and rebuild the DC
> from scratch
> 
> Cheers,
> #JORGE#
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> To: [email protected]
> Sent: 5/22/2005 1:15 AM
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> 
> Reread it Deji, I really am not agreeing with it. I noted that it might
> be
> something that could be used for whole forest corruption but I would way
> prefer a virtualized environment that can be rolled back to any point in
> time over a site lagging behind the main AD in *hopes* that it didn't
> get
> poisoned.
> 
> To make it more obvious I guess, I don't recommend lag sites. However, I
> don't recommend people tear them down if they have them. Mostly I don't
> recommend setting them up in the first place unless they are fully aware
> of
> why they are doing it and why they think there is no better answer.
> Technology doesn't often successfully make up for bad policy.
> 
> What I recommend is that they batton down the hatches even if they think
> they can't because it is has always been this way or because some Exec
> who
> needs to be taught better thinks L1 Help Desk should be able to delete
> things in an unhindered unconfirmed way, etc. I recommend they use x08
> and
> admod, I recommend they talk to Guido about a product he has put
> together to
> recover stuff which combines undelete with repopulate. Mostly I
> recommend
> not allowing accident prone folks to have the power to piss in your
> wheaties. I have never had a case where I didn't take away permissions
> for
> other people to do things and my life not get easier and the environment
> get
> more stable and secure. I don't know how many times I have heard, but I
> can't do my job without those god level rights and sure enough, without
> those god level rights, they can still do their job. The difficult here
> is
> convincing the right people that this is the right way to go and is
> often
> defeated when the people pushing for the lockdown can't argue the
> technical
> merits or can't come up with answers for questions on how to do the work
> in
> alternate ways. That is tough work, I know, I spent many hours working
> through those issues myself. More than once I took work home with me and
> cracked open MSDN trying to find a better safer way for a developer to
> do
> something. If I couldn't find an alternate method, I built some sort of
> delegation tool to do the work on their behalf or stepped up to the
> plate
> and said I would do that work when they requested it (and then worked
> like
> heck to find a better way). I much rather sign up for a lot of work than
> give out too much permissions even for a short period of time. Not
> giving
> rights is much easier than taking them back later.
> 
> Back to lag sites, if someone has a lag site and they like it and find
> it
> useful, I am behind their use of it. Of course my question to them if I
> was
> payed to look at their environment and comment on that aspect of it is
> "Why
> do you feel you need it?". Is this something you find yourself using a
> lot?
> Do you have any thoughts that possibly this is indicative of some other
> type
> of issue that could be prevented versus reacted to?
> 
> The Microsoft world has yet to really learn from the mainframe world.
> Maybe
> because it is old, people think it isn't good. The mainframe model is
> quite
> locked down. You don't give a ton of people rights, people have what
> they
> need to do their exact job and even that goes through a ton of
> filters/processes/batch, rarely if ever does anyone get core level
> change
> access rights that isn't thrown through rules and logging. Why? Because
> it
> is bad to allow just any old changes. Nearly any change in the mainframe
> world is change controlled to within an inch of its life. I think this
> is
> good for MS tech as well. It will get there as we mature, we see it
> happening now. Having lots of people that can make changes ad hoc does
> not
> increase flexibility and mobility of a company, if anything, in my
> opionion,
> it makes support more costly for a company by making the environment
> more
> difficult to support and understand.
> 
>  joe
> 
> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> [EMAIL PROTECTED]
> Sent: Saturday, May 21, 2005 3:45 AM
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> 
> Joe, you pretty much agreed with the lag site proposition towards the
> end of
> your piece. Whether you virtualize it, put it is a different physical
> location or just put it on a piece of hardware sitting in the same
> server
> room and configured with a different replication schedule, it all comes
> down
> to the same necessity of having a pristine DC that has not received your
> deletion and from which you can repopulate your F'ed up AD.
> 
> I know that you think deletion should not happen, but I have seen a few,
> so
> they do happen in reality. We've been over the discussion of the
> politics
> behind rights and permissions in many organizations and how they are
> what
> they are because we can't control them. So, bad things happens. If you
> are
> rolling in surplus money, you get a tool. If you are cash-strapped or
> like
> to roll your own, you get a qtine (lag) site.
> 
> I do not think one is better than the other.
> 
> 
> Sincerely,
> 
> D j  Ak m l f , MCSE+M MCSA+M MCP+I
> Microsoft MVP - Directory Services
> www.readymaids.com - we know IT
> www.akomolafe.com
> Do you now realize that Today is the Tomorrow you were worried about
> Yesterday?  -anon
> 
> ________________________________
> 
> From: [EMAIL PROTECTED] on behalf of joe
> Sent: Fri 5/20/2005 10:07 PM
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> 
> 
> 
> I would tend to agree with what David is saying from what I have seen of
> lag
> sites as well.
> 
> Not many people, relatively, doing it, those that are are likely to be
> doing
> it in a rough shod way.
> 
> I am not a huge fan of lag sites. I think they are ok, but for instance
> didn't think they deserved 3 or 4 different speakers talking about it at
> the
> DEC in DC a couple of years ago.
> 
> I am far more interested in taking away the rights from people to do the
> stupid deletions in the first place like was mentioned previously.
> Seriously, I have done 0, count them, 0 restores of objects in
> production
> and have been involved in some rather seriously sized implementations, 5
> years of lead AD tech for a Fortune 5 directory. The lax decision of
> accidental deletions happen is not a mentality I am like to subscribe
> to. If
> someone deleted something, my feeling is, they knew what they were doing
> and
> they were adequately aware of what they did.
> 
> First off, don't delete right off. Disable, rename, and move.
> 
> Second off, don't do admin through the GUI, too easy to click on an OU
> when
> deleting than a single user.
> 
> Third off, don't let people have the power to delete things. Let them
> request deletes of automated systems that are designed to follow good
> rules
> so appear to be smarter than the admins.
> 
> There were mentions of supportability, etc. I would not be surprised to
> hear
> MS say this is supported. Honestly, it isn't that whacky from a
> technical
> standpoint. However, if someone has gone the supportability review
> process I
> *HIGHLY* recommend they keep any and all docs with the names of the MS
> people involved locked up and saved. I have had it occur more than once
> over
> the years where I was told something was supported and fine and then
> several
> years later have them looking at me saying they would never have
> approved
> this or that. Some of the times I didn't have docs and was screwed as MS
> I
> have found is fond of saying "we don't have any documentation of that
> being
> said or being done", other times I had docs and then I see PSS trying to
> find reasons why they missed the issue or something else in the doc not
> being followed that they try to imply makes the whole thing moot.
> Unfortunately PSS will declare a lot of things as unsupportable even if
> they
> have no good answer themselves, for instance, scripted GPO deployment
> pre-GPMC. There were several years there that people were forced to come
> up
> with their own mechanisms for scripted GPO deployment before GPMC was
> released because the normal GUI just wouldn't cut it, they are all
> unsupported by MS. Unfortunately companies won't tend to find out until
> they
> contact MS about it or PSS stumbles upon it.
> 
> Back to lag sites, you, of course, have the possibilities of directory
> corruption, etc where you lose the entire directory in one fell swoop. A
> lag
> site could be used here but an auth restore is probably not going to be
> what
> you need to save you, you need to rebuild everything. Personally over a
> lag
> site I would use a site with a bunch of virtual DCs that you are taking
> down
> together and backing up the disk images of and then if you need to roll
> back, you pick the day or 4,6,8,12 hour period and roll back to it once
> everything else has been taken offline and you build the rest of your
> environment back out from this "seed" environment. This gives you the
> additional benefit of having an environment you can take into a
> segregated
> lab and test stuff any time you need to. It just needs to be done right
> or
> you will have Brett snickering at you.
> 
> As I mentioned in an earlier post, if you are afraid of deleted objects,
> I
> would recommend judicious use of searchflags&0x08 and admod with the
> -undel
> option. Couple that with a simple AD/AM directory that you don't let
> your
> loose cannon admins to have access to and you can pretty easily get
> things
> back.
> 
> 
>  joe
> 
> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of David Adner
> Sent: Friday, May 20, 2005 5:24 PM
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> 
> Using my non-scientific personal observations, of the last 50 or so
> customers I've been to I believe only 3 had lag sites.  Of those 3, none
> had
> done what I'd call a good job of setting it up (they had basically just
> created a separate site with a longer replication interval).  Of the
> other
> ~47, perhaps half knew of lag sites and were either interested in the
> concept or had plans to implement them.  How many actually will I can't
> say.
> These are all Premier customers.
> 
> So, based on my personal experience, I'm more inclined to agree with
> Todd.
> I think, however, that over the next couple years lag sites will become
> the
> norm as virtualization becomes commonplace and best practices are better
> documented and understood.
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Rick Kingslan
> > Sent: Friday, May 20, 2005 15:49
> > To: [email protected]
> > Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> >
> > Todd,
> >
> > With all due respect, I think there are more people doing this than
> > you think.  You aren't using a Lag Site, so it's 'whacky'.  Your
> > opinion, so you're entitled to it.
> >
> > PSS blessed our implementation, BTW.  If you'd like, I'll be happy to
> > provide you with contacts for the ROSS tech (out of Los Colinas) that
> > did our recent AD Health check in advance of our Win2k3/E2k3 upgrade.
> > He stated that this was becoming a cheap, scalable solution to
> > providing DR - and a few large organizations were using them at
> > warm/hot sites because they also meet criteria for DR as addressed and
> 
> > required for Sarbanes.
> >
> > And, I don't question the fact that a poor site design can cause
> > problems.  But, humbly, I submit that I know what I'm doing.  Learn
> > from what I do - or learn not.  That's up to you.  I know that you
> > have a liking for Quest - which is fine.  I use some of their tools -
> > just not Recovery Manager.
> >  However, in a DR situation when your DCs are being rebuilt from
> > scratch - Recovery Manager is not a very valuable tool when there are
> > no objects to 'undelete'.
> >
> > As for Guido - I hope he chimes in as well.  He seems to be one of the
> 
> > few that you trust - regardless of those that have supported you in
> > the past.  Hopefully then - we can put this behind us.  Me, I'll keep
> > doing what has been successful for me for two years, thank you.
> >
> > -rtk
> >
> >
> >
> > ________________________________
> >
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Myrick, Todd
> > (NIH/CC/DNA)
> > Sent: Friday, May 20, 2005 11:59 AM
> > To: [email protected]; [email protected]
> > Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> >
> >
> >
> > I disagree that Lag sites are popular, maybe with you and at AD
> > conferences as a session.  I tend to avoid those sessions.
> >
> >
> >
> > To all those considering this as a viable solution, why not run it by
> > MSC or PSS and see what they say.  We get something called a
> > supportability review before we implement anything to whacky at my
> > organization.
> >
> >
> >
> > There are so many things that can go wrong with a improper site design
> 
> > and object reanimation that I just say avoid doing it.
> >
> >
> >
> > I am waiting for Guido to chime in on this.
> >
> >
> >
> > Todd
> >
> >
> >
> > ________________________________
> >
> > From: Dan Holme [mailto:[EMAIL PROTECTED]
> > Sent: Thu 5/19/2005 10:16 AM
> > To: [email protected]
> > Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> >
> > Two more notes on this issue:
> >
> > 1) THIRD PARTY AD RESTORE TOOLS.  Sounds like it's clear, now, WHY lag
> 
> > sites are so popular.  Yes, there are third party products
> > (particularly Quest Recovery Manager) that work quite well if you have
> 
> > a budget for that.  Here's my take as to why my IT budget shouldn't be
> 
> > spent on those tools (and *should* be spent on OTHER tools by some of
> > those same companies).
> >
> >         a) Deleted objects can be avoided with proper delegation.
> > It's so important that you properly delegate and properly use accounts
> 
> > with administrative logon (i.e. with 'secondary logon' only) that this
> 
> > trumps just about everything.  At most of my clients, NOBODY (from a
> > practical
> > perspective) can delete users or groups.  We have a process we call
> > graveyarding, whereby an account is tagged (using a variety of
> > methods) and, with a SCRIPT, moved to an OU where they stay for 90
> > days before being deleted (again, only by the SCRIPT).  The only other
> 
> > accounts that can delete users and groups are the super-high admins
> > (e.g. Domain Admins equivalents).  This is only a piece of the
> > picture, but it is an important piece.
> >
> >         b) Deleted objects can be restored for FREE using ADRESTORE
> > from Sysinternals.  Granted, this tool brings back only the object
> > (SID, GUID, DN, CN) but that's all that really matters, right?  The
> > best (FREE) approaches we take at clients include *regularly* logging
> > group memberships in a custom database (to compare to last-knowns and
> > watch for issues easily and free-ly).  So when we restore a group we
> > can repopulate membership quickly, anyway.  So with good processes,
> > it's FREE and easy to restore objects in most situations.
> >
> >         c) Windows Server 2003 SP1 adds a feature that makes
> > reanimating Groups MUCH easier when you have deleted groups & users.
> > No more "auth restore two times" necessary. (Haven't seen it?  Do an
> > auth restore on a group on an SP1 DC and find the LDIF file it
> > creates!!)
> >
> >         d) that leaves only really nasty deletions (e.g. an entire
> > OU), which, given a & b, will probably never happen.
> > And when they do, an auth restore on a lag site takes a very short
> > time.
> >
> >         e) therefore, I save my IT budget and use the $ on tools to
> > aid provisioning, auditing & monitoring, again to avoid problems in
> > the first place.
> >
> > 2) PREVENTING AUTHENTICATION ON LAG SITE.  As I mentioned, the method
> > I've heard of, and that we're testing, is to stop the NetLogon service
> 
> > on the lag DCs.  There are several ways to avoid it restarting when/if
> 
> > the DC is rebooted.  The article referenced in the ORIGINAL post
> > suggested modifying which SRV records are registered.  This should
> > work, I'd guess, and is more elegant.  The trick is that SRV records
> > are not registered.  The A records still are, so DCs should be able to
> 
> > find each other and replicate successfully, but clients won't 'see'
> > the DCs as a viable authentication option.  I've not tried that
> > approach but it sounded really good.
> >
> > 3) OK, three notes.  LAG SITES can be done with DCs in a site with a
> > long replication interval, or by changing the replication WINDOW
> > (schedule).  It's a good idea to have TWO lag sites on alternating
> > frequencies, to avoid a situation where something awful happens just
> > before a lag site happens to replicate.  Someone detailed this
> > earlier, and it's a good note!
> >
> > Dan
> >
> >
> >
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Myrick, Todd
> > (NIH/CC/DNA)
> >
> > Sent: Thursday, May 19, 2005 6:34 AM
> > To: [email protected]
> > Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
> >
> > Is it cheaper and more efficient to go the replication lag site route
> > than buy a proper backup and object level restore solution?
> >
> > I mean not to toot a vendor's horn, but Quest recovery manager turns
> > the process of restoring objects into a 15 minute click click
> > operation.  I would hate to think of the number of steps you all must
> > do to reanimate the object in a directory using the "Recovery Site".
> >
> > >From a operations standpoint, there is no substitute for a proper
> > >backup
> > solution and object level restore utility for AD.
> >
> > Thanks,
> >
> > Todd Myrick
> >
> > -----Original Message-----
> > From: TIROA YANN [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, May 19, 2005 4:20 AM
> > To: [email protected]
> > Subject: RE: [ActiveDir] AD DR - replication lag site
> >
> > Neil,
> >
> > I now understand... I'm a new man by now thanks to the mysterious lag
> > site that have been revealed to me :-))
> >
> > Thanks a lot for your explanations.
> >
> > Cordialement,
> >
> > Yann TIROA
> >
> > Centre de Ressources Informatique.
> > Campus Scientifique de la DOUA.
> > B t. Gabriel Lippmann - 2  me  tage - salle 238.
> > 43, Bd du 11 Novembre 1918.
> > 69622 Villeurbanne Cedex.
> >
> >
> >
> > -----Message d'origine-----
> > De : [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] De la part de Ruston, Neil
> 
> > Envoy  : jeudi 19 mai 2005 10:09   :
> > '[email protected]'
> > Objet : RE: [ActiveDir] AD DR - replication lag site
> >
> > If the deletion occurs on DC1, then a DC (DC2) in the lag site will
> > not receive the deletion immediately. You therefore have a window of
> > opportunity in which the deletion may be 'undone'.
> >
> > The deleted object may be auth restored on DC2 and thus replicated /
> > reanimated on DC1 (and any other DC which has received the deletion).
> >
> > [My terminology may not be acceptable to some - I have deliberately
> > explained this in simplistic terms :)]
> >
> > neil
> >
> >
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of TIROA YANN
> > Sent: 19 May 2005 08:54
> > To: [email protected]
> > Subject: RE: [ActiveDir] AD DR - replication lag site
> >
> >
> >
> > Hello,
> >
> > I must apologize, but i'm a little bit confused. You said "With a lag
> > site, you ONLY have to do an authoritative restore (NTDSUTIL)".
> >
> > Do you mean if i delete my OU in DC in site A, all i have to do is do
> > an autoritative restore, not on site A, BUT on DC on lag site, reboot,
> 
> > and dforce replication to site A ? And the non-autoritative restore
> > will be in fact the data on the lag site, that explain your pr vious
> > sentence ? Waou!
> > That's very celver !!
> >
> > Am I right ?
> >
> > Regards,
> >
> > Yann
> >
> >
> >
> > -----Message d'origine-----
> > De : [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] De la part de Dan Holme
> > Envoy  :
> > jeudi 19 mai 2005 08:51   : [email protected] Objet : RE:
> > [ActiveDir] AD DR - replication lag site
> >
> > The major issue is the SPEED of recovery.  With a lag site, you ONLY
> > have to do an authoritative restore (NTDSUTIL).
> >
> > Without a lag site, you must first restore the AD from backup tape
> > ('normal'
> > restore), which can take quite some time!!!! Then, and only then, can
> > you do the auth restore.
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of TIROA YANN
> > Sent: Wednesday, May 18, 2005 11:46 PM
> > To: [EMAIL PROTECTED]; [email protected]
> > Subject: RE: [ActiveDir] AD DR - replication lag site
> >
> > Hello,
> >
> > Thanks for this interesting tips, but i didn't really understand the
> > "behind the techno"  of a lag site in case of just a deletion of an
> > entire OU with many objects.
> >
> > For example,if I have AD 2003 domain with 2 sites:
> > Site A has 2 DCs
> > Site B has one DC and is the lag site
> > Between 2 sites, i scheduled repl to appear every 1 week.
> >
> > In the situation of an OU deletion, i go to the DC i have made the
> > deletion, and do an autoritative restore in dsmode and after rebbot,
> > wait for replication to take place in order to repopulate all my
> > domain with my OU restored. So what will the lag site help me in this
> > situation ?
> >
> > I can understand that a lag site will help me if all my DCs in site A
> > crashed. So i would take all informations from the lag site to be
> > restored in site A such as "copy" my domain from the lag site by doing
> 
> > a dcpromo /adv, and go my freshly installed DCs on site A, and
> > restored my whole domain.
> > However, I think i will have more updated information by restoring
> > from my yerterday backup than from the lag site...
> >
> > So, could you help me better understand the behind the techno of a lag
> 
> > site, i thing i misunderstand something important ;-(
> >
> > Thank you for your feedback.
> >
> > Have a nice day :-)
> >
> > Regards,
> >
> > Yann
> >
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive:
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive:
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> >
> > ==============================================================
> > ==============
> > ==
> > This message is for the sole use of the intended recipient.
> > If you received this message in error please delete it and notify us.
> > If this message was misdirected, Credit Suisse, its subsidiaries and
> > affiliates (CS) do not waive any confidentiality or privilege. CS
> > retains and monitors electronic communications sent through its
> > network.
> > Instructions transmitted over this system are not binding on CS until
> > they are confirmed by us. Message transmission is not guaranteed to be
> 
> > secure.
> > ==============================================================
> > ==============
> > ==
> >
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive:
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive:
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive:
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> > List info   : http://www.activedir.org/List.aspx
> > List FAQ    : http://www.activedir.org/ListFAQ.aspx
> > List archive:
> > http://www.mail-archive.com/activedir%40mail.activedir.org/
> >
> >
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> 
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> 
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be
> copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
> 
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> 
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
>
.B�vrz�ryi
List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

RE: [ActiveDir] AD DR - replication lag site----Why?

Reply via email to