RE: [ActiveDir] AD DR - replication lag site----Why?

Myrick, Todd (NIH/CC/DNA) Mon, 23 May 2005 03:14:07 -0700

Using the powers of the MVP, I now officially pronounce this thread as
complete :)

Todd

-----Original Message-----
From: Jorge de Almeida Pinto [mailto:[EMAIL PROTECTED] 
Sent: Sunday, May 22, 2005 4:12 PM
To: 'joe '; '[EMAIL PROTECTED] ';
'[email protected] '
Subject: RE: [ActiveDir] AD DR - replication lag site----Why?

(1)
In the Netherlands when you own a car and you drive with it, by law you at
least must have a basic insurance that covers liability. This simply means
that when you cause damage with your car the other party gets paid to repair
their damage. You, however, have to pay for your own damage.
This is for those cases when most people cannot affort such high costs. This
also applies to situations in IT. You'll always have to answer: how much can
you affort when this or that occurs?

What I'm trying to say here is: you implement a certain solution to
accomodate those situations (in solving) when errors have been made by
persons. People simply make mistakes and I agree it is better to prevent
errors against repairing them. But again it is not always possible, because
it costs too much money, or they simply don't care or they are not aware of
the damage that can come from it, etc.. How many times have you heard "it
will not happen to me, only to others"
There are also a lot of people that only understand (or get interested in)
what you are trying to say/explain when they are in deep sh*t. But then...
it's too late and much more expensive solutions/activities must be used if
sometimes a solution exists for the occasion. 
It is always the choice between: "trying to save money now and spend a crap
load later each time it happens" or "spend a little bit of money now and
spend less money later". I believe in spending money now to save later
(long-term thought). A lot of managers only think about spending as little
money as possible. Eventual problems in the future are not problems at the
moment (short term thought)

(2)
>> When I say rollback, there is nothing left of the forest to get a USN 
rollback and no worries of TLS.

I understand that an old state of the virtual environment is only used when
ALL other DCs are down, gone, bye bye, etc. (or at least isolated and some
of the activities mentioned in the MS DR WP are done like resetting all
kinds of accounts/trusts)
When I think about it now, you are right, I made a mistake, sorry for that.
Yes, when "giving life" to virtual DCs that belong to the same old state of
the virtual environment all those virtual DCs know about the state of the
other DCs in the virtual environment.
I do believe this solution provides a very fast recovery of the first DCs in
the forest to be rebuild.
With this solution and using the native way (restoring a backup) when the
tombstone lifetime has passed (the virtual environment state of the backups
used) you will experience event 2042 on all DCs (Event ID 2042: It has been
too long since this machine replicated)

After bringing the initial environment up you need to execute the steps as
mentioned in the MS DR white paper.
You especially need to think if everything really is down or the corrupt
forest is still up to provide functionalities for the users while restoring
a new forest in parallel.
And yes, there are a lot of decisions to be made and each IS different for
each company

Cheers
#JORGE#

-----Original Message-----
From: [EMAIL PROTECTED]
To: [email protected]
Sent: 5/22/2005 7:05 PM
Subject: RE: [ActiveDir] AD DR - replication lag site----Why?

1.

> I assume almost everyone has an insurance policy for their house if it
burns down.

In the US, you can't get a mortgage unless you get insurance. Ditto
cars,
loans require full coverage and local law enforcement requires some
minimal
level of insurance, you have no choice in the matter. A lot of people
buy
insurance not because they want it or would get it themselves, but
because
some requirement forces them to. Buying insurance is like enforced
gambling,
you are forced to pay to gamble that your house will burn down or you
will
crash your car. Insurance companies who set the prices and hence the
profit
are gambling you won't have an issue and have weighted the payoff
accordingly. If you ever take advantage of that insurance, you are
pretty
much guaranteed to see an increase in your rates at some future point
for
taking advantage. Of course you are also quite likely to get an increase
when Dean's house gets whacked by a hurricane as well. 

Insurance on cars and houses is not an optimal example. Maybe one that
is
closer would be the optional insurance you can get for car malfunctions
or
electronics or even MS Software... AKA service contracts. Mr. Jones you
should protect this TV because it could possibly fail in the next year
and
you don't want that expense.... Mr. Jones, you should protect this MS
environment because you may have something fail in the next year and you
don't want that expense..... Of course, the first thing people start to
wonder at that point when hearing those pitches is why are there
warranties
at all... Again, it is gambling, only, unlike the insurance stuff
mentioned
above, you have a realistic choice with several options. 

> What is that better answer in your opinion? 

The better answer is to understand why this needs to be done and explain
how
you can get away from it. I have lived the "Fortune 5 argue until you
are
blue in the face about giving out too many permissions" life. When I got
blue in the face and people didn't listen, I did it some more. It wasn't
like I walked in and said some stuff and everyone said, well cheerio,
great
idea, let's do it. I had to go through all of the issues that were being
experienced and the goals the company had and show how locking the
environment down would help all of it. It was a slow and extremely
painful
experience over the years. You can ask Deji, I literally had people who
would have tried to kick my ass had they gotten me alone as I started to
win
the arguments. 

Again, technology can not save you from bad policy. It just doesn't have
the
power to do it. You have to, as an admin, as a consultant, as someone
there
to tell people what is right, have the chutzpah to point out what is
stupid
and say, this makes no sense - it is contraindicated by your goals.
Without
deviation from norm, there can be no progress. If I ask someone why they
do
something and they say that is the way it is always done, that means the
something is suspect and needs to be reviewed. If I hear myself saying
it, I
review it. 

Political issues are something I understand. You want politics, step
into a
Fortune 5 company. You would swear you were sitting in a House of
Congress.
You have to find the political hotbuttons and push them or point out
different buttons to people who didn't realize they had a button in the
first place. It is very dynamic, one solution will not work everywhere,
but
it is the job of any admin or consultant who cares to do their best to
point
this out and correct it if they are involved. I have done this in so
many
ways for so many things over the years I don't recall most of it. I do
recall that I was always hated for pointing it out but in the end when
it
was better, people respected me and liked me even more because things
were
overall better. 

It is very likely the reason I could do this and not worry is that I
have
never had a job I was so overally attached to that I wouldn't walk away
from
or risk being fired. I expect that may be rather unique but I am pretty
confident in my ability to find a new job, even if it has to be washing
dishes. It has nothing to do with me needing to be right, it is all
about
doing the right thing. Bowing to political pressure almost never is the
right thing. If you cave in to it and things go wrong later, politicians
are
good for pointing at you and saying, he/she went along with it and that
is
our technical information source...

In the meanwhile, you, if you are the admin running a dangerous
environment,
protect yourself. If you choose a lag site, so be it. If you know how to
set
it up properly and use it and want to be in the world of auth restores
of
objects, knock yourself out. I would choose the option of populating the
data to a protected store and using undelete or using x08, a protected
store, and undelete. Overall I think it is an easier, faster, and
cheaper
solution. If I was in an environment where undelete wasn't available, I
would fight like hell to get it there or take away the rights from
people to
stick a hot poker up my butt and deal with the political consequences.
I've
done that. I would rather spend 80 hours a week doing other people's
jobs
than 40 hours a week trying to put crap back together that some bonehead
who
shouldn't have the ability screwed up.

> In the end there is a big difference between "being right" and
"getting
it"!

Agreed. But because it is hard or someone said it isn't going to happen
isn't a reason to not pursue it. Again I have been told so many times
something would never happen and turn around and work it and work it and
work it and see it happen a few months or years later. You just have to
be
willing to look at what is wrong, point out the issues, and then every
time
something happens that the solution you had would have helped, point out
again why that solution needs to be implemented. I have no problem
looking
an Exec or Director or Manager or anyone else in the eye and saying,
"this
wouldn't have been an issue if we were doing this right" or even saying,
"this is what I told you about before, do you understand what I was
saying
now?". Your manager doesn't listen, escalate. 

In summary, if you do silly things, you will be bitten. You can try and
put
things in place to make it so you bleed less but you still end up with a
bite and probably an ugly scar. You certainly spend a lot of time
putting on
antibacterial healing salves that could be better spent doing something
else.

I have something that I tend to say a lot to admins. Don't get into a
situation where you are so busy chopping down trees that you can't
sharpen
your axe. It is a losing position. If you find yourself there, you must
utilize your own time and resources to make time to sharpen your axe or
else
you are doomed to chopping down a forest you can never get to the end of
because you get slower and slower as your axe grinds down to a nub and
then
you can't acoomplish anything. That isn't a fun position to be in. Work
isn't supposed to be something you hate, you shouldn't actually dislike
most
of the week. If you do, you are doing something wrong or you are in the
wrong job. 

2. 

> However I do think you have to be carefull with this because of 
> USN rollback, tombstone lifetime and replication and maybe some 
> other stuff as the DCs are (I think) not recovered using the native MS
way.

When I say rollback, there is nothing left of the forest to get a USN
rollback and no worries of TLS. I am talking about every DC is just
gone,
shut off, fdisked, etc. The rolled back environment comes up and is the
entire environment. Even if a DC is up that isn't part of that, it is
wiped
from the seed environment and gone from name resolution. The issues you
could have will be SID issues on resource ACLs and reverted passwords on
any
objects that have passwords. However, if you got to this point, that is
called acceptable losses and you clean it up after you have the business
functioning again. 

As an example, we had this exact plan for every schema update deployment
and
any seriously major update, the world has ended scenario. Something that
wasn't found in Test or QA that knocks a global forest down for the
count.
We had a lifeboat with a DC or two from every domain that was off the
main
network. Everyone was ready to power down their local DCs and insert
rebuild
CDs in based on a call from our group. The only unknown was when to call
for
that point to start. 

That point is variable and has to be, unless you know exactly how long
you
can be without the environment as a whole and how long it will take to
rebuild to an acceptable level you will be making an ad hoc judgement of
when to go whole hog recovery. If you know it takes 5 hours to get to an
acceptable level and you know you can't be fully down for more than 12
hours, you know that if you have a complete down that at the 7 hour mark
your decision is made for you. 

When to make the decision depends on what your recovery plan is, how
long it
takes to execute, what is dependent on you and what the priority is of
those
things that are dependent on you are for getting up, and many other
things.
If you have a full forest failure, you don't need every DC of the forest
back up and running immediately unless you only have one DC per domain,
and
if that is the case, you tend to have a fairly simple recovery process.
More
than likely you can have a pool of machines at a DataCenter hub that can
get
brought up to service the environment in a minimal fashion while you get
more up. So what is that minimal fashion? Do you know for your company?
It
will be different for every single company. What applications or
services
must be available in 2 hours, in 4 hours, in 8 hours, etc. Are your
plans
structured for that? Do you have enough money and resources to
accomplish
it? 

You will of course try to troubleshoot the issue. At some point you
could
make the decision or be forced to make the decision to recover from
scratch.
Have you thought about the situations that could cause that and do you
have
an inkling of a plan to do it? Do you know the prioritization globally
of
apps and services that depend on your resources? Is there one? In many
or
probably even most, no, everyone assumes they are #1. 

In the last place I was at where it was my thing to think about that
stuff,
the priorities were basically this, get something up this is the seed,
it is
enough to get other DCs starting to be built. Once that next set of DCs
is
built, that services the DataCenters for Data Center specific stuff,
then
you get more DCs up to handle some small amount of WAN traffic coming
in.
Then you get more and more capacity for WAN traffic. In the end, you
work on
getting the WAN sites up. The seed should be up in running in less than
an
hour. IMO, done properly, it should only take as long as it takes to
copy
the proper virtual disk images to a machine that can run them. Then you
start dumping IFM images and spinning up real DCs to handle Data Center
Apps
(they are at the data center because they are important right?). This
second
stage can vary but shouldn't be any longer than a couple of hours, how
long
does it take you to build a server from the ground up? Do you have an
automated load process? If not why not? Once built IMF and now you have
your
initial data center structure. Take a breath, build up some extra
capacity
DCs for the datacenter. If you have enough resources, you did this when
building the initial real DataCenter DCs. Getting Data Centers up and
running enough to handle local and WAN requests should be something that
takes hours, not days. Days, weeks, months is the time frame for WAN
site
DCs. Doesn't mean the WAN sites aren't working, they just aren't working
in
an optimal state. If you are doing cool virtual stuff where it isn't
just
your seed using the virtual environment, you could have your environment
up
very quickly. Watch Dean roll back 20,30 machines in a few seconds some
day
and you get an idea of what can be done with virtualization.

3.

My thoughts on this are always to simply rebuild. A DC shouldn't be
unique
enough to require a restore from backup. As Rick mentioned previously,
they
are little tin soldiers. Knock them down as you need to. IMO, a properly
configured environment can have just about any DC knocked down at nearly
any
time without many adverse affects due to failover and fault tolerance.
There
are examples where this isn't the case such as apps that point at a
specific
DC. If you have any encounter with apps like this, beat them until they
fix
it. If it is a synced type app, you will be hard pressed to get it
changed
(like say the RUS). If it is simply an app using AD for auth or data
retrieval, there is no excuse to letting them hard point to one machine.
I
have been in many fights with integrators and developers of Unix/Java
based
apps that only allow them to select one machine. My response to them is
find
a better way or else they WILL break. I will not pony up additional
responsibility because their app doesn't work properly.

   joe

-----Original Message-----
From: Jorge de Almeida Pinto
[mailto:[EMAIL PROTECTED] 
Sent: Sunday, May 22, 2005 8:41 AM
To: 'joe '; '[EMAIL PROTECTED] ';
'[email protected] '
Subject: RE: [ActiveDir] AD DR - replication lag site----Why?

Hi,

In my opinion the following recovery situations exist when it comes to
AD:
(1) Accidental object deletions
(2) Your forest/domain drops dead
(3) A DC drops dead

(1) Accidental object deletions
I agree with Joe that people should only have those permissions needed
to do
their work and this should be configured accordingly. I also agree that
not
too many people should have domain/enterprise admin permissions. However
in
the real world this is not always possible because of lot of reasons
(history , politics, etc.) Organizations are not 100% perfect, that's a
fact
also. Looking at the future and preparing for the worst, solutions are
implemented to mitigate those risks. Costs are made in advance to save
time
and money in the future. It's somehow like an insurance policy. When
something goes wrong I have something to fall back on. I assume almost
everyone has an insurance policy for their house if it burns down. How
many
times will you use that insurance policy in your lifetime? Never if
you're
lucky... once if you're in bad luck... twice if you're in really bad
luck!
In the case of accidental objects deletions customers need/want a
solution!
What is that solution? Is it a lag site, is it a tool like Quest
Recovery
Manager, is it a tool like Guido's tool, is it something else I/we still
don't know about? It all depends on the functionality needed by the
customer
and the cost to implement and maintain the tool/solution.

In my opinion a LAG is one of those solutions for accidental object
deletions, as always and only when implemented correctly.

Joe (and others), you don't recommend setting up lag sites as there
could be
a better answer. What is that better answer in your opinion? What would
you
do if a customer said to you: "I want to have ADMIN rights and I want to
be
able to delete objects in my forest/domain and I want you to provide a
solution for me if I delete the wrong objects" (The answer: "take away
admin
rights is not an option" ;-)) ) What is your solution for accidental
object
deletions? That is what I'm interested in.

In the end there is a big difference between "being right" and "getting
it"!

(2) Your forest drops dead
I don't think LAG sites are a solution when your forest drops dead,
especially in a large environment. What's the primary goal to acchieve
when
your forest drops dead (and what's the second?)? (please give me
answers..)
When the forest drops dead, nobody can do anything anymore.
In my opinion the first goal to acchieve is to get everything up and
running
as fast as possible and provide for the max. of functionality as
possible to
the end users. In my opinion the second goal is to repair the health of
the
forest and if it is really screwed rebuild it. So for this you need a
procedure that accomodates those situations.
I always hear everyone talk about a forest recovery as in rebuilding the
forest from scratch. Rebuilding a forest because it dropped dead should
be
(again in my opinion) the last step ever taken because this means you're
going back in time and therefore you will loose info. I believe that
there
exists more between a healthy forest and a forest that needs to be
rebuild.
Do you guys agree?

As for the "virtualized environment that can be rolled back to any point
in
time" I think that can be part of a solution to start rebuilding a
forest.
However I do think you have to be carefull with this because of USN
rollback, tombstone lifetime and replication and maybe some other stuff
as
the DCs are (I think) not recovered using the native MS way. At DEC I
heart
Dean and Joe and some other guys talk about this method. Unfortunately I
did
not hear the complete story behind this and to be honest I have not put
any
time to it to think about it and how it may work as a quick start for a
forest rebuild 

(3) A DC drops dead
We all know this one.
Restore the DC from a backup or do a metadata cleanup and rebuild the DC
from scratch

Cheers,
#JORGE#

-----Original Message-----
From: [EMAIL PROTECTED]
To: [email protected]
Sent: 5/22/2005 1:15 AM
Subject: RE: [ActiveDir] AD DR - replication lag site----Why?

Reread it Deji, I really am not agreeing with it. I noted that it might
be
something that could be used for whole forest corruption but I would way
prefer a virtualized environment that can be rolled back to any point in
time over a site lagging behind the main AD in *hopes* that it didn't
get
poisoned. 

To make it more obvious I guess, I don't recommend lag sites. However, I
don't recommend people tear them down if they have them. Mostly I don't
recommend setting them up in the first place unless they are fully aware
of
why they are doing it and why they think there is no better answer.
Technology doesn't often successfully make up for bad policy.

What I recommend is that they batton down the hatches even if they think
they can't because it is has always been this way or because some Exec
who
needs to be taught better thinks L1 Help Desk should be able to delete
things in an unhindered unconfirmed way, etc. I recommend they use x08
and
admod, I recommend they talk to Guido about a product he has put
together to
recover stuff which combines undelete with repopulate. Mostly I
recommend
not allowing accident prone folks to have the power to piss in your
wheaties. I have never had a case where I didn't take away permissions
for
other people to do things and my life not get easier and the environment
get
more stable and secure. I don't know how many times I have heard, but I
can't do my job without those god level rights and sure enough, without
those god level rights, they can still do their job. The difficult here
is
convincing the right people that this is the right way to go and is
often
defeated when the people pushing for the lockdown can't argue the
technical
merits or can't come up with answers for questions on how to do the work
in
alternate ways. That is tough work, I know, I spent many hours working
through those issues myself. More than once I took work home with me and
cracked open MSDN trying to find a better safer way for a developer to
do
something. If I couldn't find an alternate method, I built some sort of
delegation tool to do the work on their behalf or stepped up to the
plate
and said I would do that work when they requested it (and then worked
like
heck to find a better way). I much rather sign up for a lot of work than
give out too much permissions even for a short period of time. Not
giving
rights is much easier than taking them back later. 

Back to lag sites, if someone has a lag site and they like it and find
it
useful, I am behind their use of it. Of course my question to them if I
was
payed to look at their environment and comment on that aspect of it is
"Why
do you feel you need it?". Is this something you find yourself using a
lot?
Do you have any thoughts that possibly this is indicative of some other
type
of issue that could be prevented versus reacted to? 

The Microsoft world has yet to really learn from the mainframe world.
Maybe
because it is old, people think it isn't good. The mainframe model is
quite
locked down. You don't give a ton of people rights, people have what
they
need to do their exact job and even that goes through a ton of
filters/processes/batch, rarely if ever does anyone get core level
change
access rights that isn't thrown through rules and logging. Why? Because
it
is bad to allow just any old changes. Nearly any change in the mainframe
world is change controlled to within an inch of its life. I think this
is
good for MS tech as well. It will get there as we mature, we see it
happening now. Having lots of people that can make changes ad hoc does
not
increase flexibility and mobility of a company, if anything, in my
opionion,
it makes support more costly for a company by making the environment
more
difficult to support and understand. 

  joe

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Saturday, May 21, 2005 3:45 AM
To: [email protected]
Subject: RE: [ActiveDir] AD DR - replication lag site----Why?

Joe, you pretty much agreed with the lag site proposition towards the
end of
your piece. Whether you virtualize it, put it is a different physical
location or just put it on a piece of hardware sitting in the same
server
room and configured with a different replication schedule, it all comes
down
to the same necessity of having a pristine DC that has not received your
deletion and from which you can repopulate your F'ed up AD.

I know that you think deletion should not happen, but I have seen a few,
so
they do happen in reality. We've been over the discussion of the
politics
behind rights and permissions in many organizations and how they are
what
they are because we can't control them. So, bad things happens. If you
are
rolling in surplus money, you get a tool. If you are cash-strapped or
like
to roll your own, you get a qtine (lag) site.

I do not think one is better than the other.

Sincerely,

D�j� Ak�m�l�f�, MCSE+M MCSA+M MCP+I
Microsoft MVP - Directory Services
www.readymaids.com - we know IT
www.akomolafe.com
Do you now realize that Today is the Tomorrow you were worried about
Yesterday?  -anon

________________________________

From: [EMAIL PROTECTED] on behalf of joe
Sent: Fri 5/20/2005 10:07 PM
To: [email protected]
Subject: RE: [ActiveDir] AD DR - replication lag site----Why?

I would tend to agree with what David is saying from what I have seen of
lag
sites as well.

Not many people, relatively, doing it, those that are are likely to be
doing
it in a rough shod way.

I am not a huge fan of lag sites. I think they are ok, but for instance
didn't think they deserved 3 or 4 different speakers talking about it at
the
DEC in DC a couple of years ago.

I am far more interested in taking away the rights from people to do the
stupid deletions in the first place like was mentioned previously.
Seriously, I have done 0, count them, 0 restores of objects in
production
and have been involved in some rather seriously sized implementations, 5
years of lead AD tech for a Fortune 5 directory. The lax decision of
accidental deletions happen is not a mentality I am like to subscribe
to. If
someone deleted something, my feeling is, they knew what they were doing
and
they were adequately aware of what they did.

First off, don't delete right off. Disable, rename, and move.

Second off, don't do admin through the GUI, too easy to click on an OU
when
deleting than a single user.

Third off, don't let people have the power to delete things. Let them
request deletes of automated systems that are designed to follow good
rules
so appear to be smarter than the admins.

There were mentions of supportability, etc. I would not be surprised to
hear
MS say this is supported. Honestly, it isn't that whacky from a
technical
standpoint. However, if someone has gone the supportability review
process I
*HIGHLY* recommend they keep any and all docs with the names of the MS
people involved locked up and saved. I have had it occur more than once
over
the years where I was told something was supported and fine and then
several
years later have them looking at me saying they would never have
approved
this or that. Some of the times I didn't have docs and was screwed as MS
I
have found is fond of saying "we don't have any documentation of that
being
said or being done", other times I had docs and then I see PSS trying to
find reasons why they missed the issue or something else in the doc not
being followed that they try to imply makes the whole thing moot.
Unfortunately PSS will declare a lot of things as unsupportable even if
they
have no good answer themselves, for instance, scripted GPO deployment
pre-GPMC. There were several years there that people were forced to come
up
with their own mechanisms for scripted GPO deployment before GPMC was
released because the normal GUI just wouldn't cut it, they are all
unsupported by MS. Unfortunately companies won't tend to find out until
they
contact MS about it or PSS stumbles upon it.

Back to lag sites, you, of course, have the possibilities of directory
corruption, etc where you lose the entire directory in one fell swoop. A
lag
site could be used here but an auth restore is probably not going to be
what
you need to save you, you need to rebuild everything. Personally over a
lag
site I would use a site with a bunch of virtual DCs that you are taking
down
together and backing up the disk images of and then if you need to roll
back, you pick the day or 4,6,8,12 hour period and roll back to it once
everything else has been taken offline and you build the rest of your
environment back out from this "seed" environment. This gives you the
additional benefit of having an environment you can take into a
segregated
lab and test stuff any time you need to. It just needs to be done right
or
you will have Brett snickering at you.

As I mentioned in an earlier post, if you are afraid of deleted objects,
I
would recommend judicious use of searchflags&0x08 and admod with the
-undel
option. Couple that with a simple AD/AM directory that you don't let
your
loose cannon admins to have access to and you can pretty easily get
things
back.

  joe

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David Adner
Sent: Friday, May 20, 2005 5:24 PM
To: [email protected]
Subject: RE: [ActiveDir] AD DR - replication lag site----Why?

Using my non-scientific personal observations, of the last 50 or so
customers I've been to I believe only 3 had lag sites.  Of those 3, none
had
done what I'd call a good job of setting it up (they had basically just
created a separate site with a longer replication interval).  Of the
other
~47, perhaps half knew of lag sites and were either interested in the
concept or had plans to implement them.  How many actually will I can't
say.
These are all Premier customers.

So, based on my personal experience, I'm more inclined to agree with
Todd.
I think, however, that over the next couple years lag sites will become
the
norm as virtualization becomes commonplace and best practices are better
documented and understood.

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Rick Kingslan
> Sent: Friday, May 20, 2005 15:49
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
>
> Todd,
>
> With all due respect, I think there are more people doing this than 
> you think.  You aren't using a Lag Site, so it's 'whacky'.  Your 
> opinion, so you're entitled to it.
>
> PSS blessed our implementation, BTW.  If you'd like, I'll be happy to 
> provide you with contacts for the ROSS tech (out of Los Colinas) that 
> did our recent AD Health check in advance of our Win2k3/E2k3 upgrade.
> He stated that this was becoming a cheap, scalable solution to 
> providing DR - and a few large organizations were using them at 
> warm/hot sites because they also meet criteria for DR as addressed and

> required for Sarbanes.
>
> And, I don't question the fact that a poor site design can cause 
> problems.  But, humbly, I submit that I know what I'm doing.  Learn 
> from what I do - or learn not.  That's up to you.  I know that you 
> have a liking for Quest - which is fine.  I use some of their tools - 
> just not Recovery Manager.
>  However, in a DR situation when your DCs are being rebuilt from 
> scratch - Recovery Manager is not a very valuable tool when there are 
> no objects to 'undelete'.
>
> As for Guido - I hope he chimes in as well.  He seems to be one of the

> few that you trust - regardless of those that have supported you in 
> the past.  Hopefully then - we can put this behind us.  Me, I'll keep 
> doing what has been successful for me for two years, thank you.
>
> -rtk
>
> 
>
> ________________________________
>
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Myrick, Todd
> (NIH/CC/DNA)
> Sent: Friday, May 20, 2005 11:59 AM
> To: [email protected]; [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
>
> 
>
> I disagree that Lag sites are popular, maybe with you and at AD 
> conferences as a session.  I tend to avoid those sessions.
>
> 
>
> To all those considering this as a viable solution, why not run it by 
> MSC or PSS and see what they say.  We get something called a 
> supportability review before we implement anything to whacky at my 
> organization.
>
> 
>
> There are so many things that can go wrong with a improper site design

> and object reanimation that I just say avoid doing it.
>
> 
>
> I am waiting for Guido to chime in on this.
>
> 
>
> Todd
>
> 
>
> ________________________________
>
> From: Dan Holme [mailto:[EMAIL PROTECTED]
> Sent: Thu 5/19/2005 10:16 AM
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
>
> Two more notes on this issue:
>
> 1) THIRD PARTY AD RESTORE TOOLS.  Sounds like it's clear, now, WHY lag

> sites are so popular.  Yes, there are third party products 
> (particularly Quest Recovery Manager) that work quite well if you have

> a budget for that.  Here's my take as to why my IT budget shouldn't be

> spent on those tools (and *should* be spent on OTHER tools by some of 
> those same companies).
>
>         a) Deleted objects can be avoided with proper delegation. 
> It's so important that you properly delegate and properly use accounts

> with administrative logon (i.e. with 'secondary logon' only) that this

> trumps just about everything.  At most of my clients, NOBODY (from a 
> practical
> perspective) can delete users or groups.  We have a process we call 
> graveyarding, whereby an account is tagged (using a variety of
> methods) and, with a SCRIPT, moved to an OU where they stay for 90 
> days before being deleted (again, only by the SCRIPT).  The only other

> accounts that can delete users and groups are the super-high admins 
> (e.g. Domain Admins equivalents).  This is only a piece of the 
> picture, but it is an important piece.
>
>         b) Deleted objects can be restored for FREE using ADRESTORE 
> from Sysinternals.  Granted, this tool brings back only the object 
> (SID, GUID, DN, CN) but that's all that really matters, right?  The 
> best (FREE) approaches we take at clients include *regularly* logging 
> group memberships in a custom database (to compare to last-knowns and 
> watch for issues easily and free-ly).  So when we restore a group we 
> can repopulate membership quickly, anyway.  So with good processes, 
> it's FREE and easy to restore objects in most situations.
>
>         c) Windows Server 2003 SP1 adds a feature that makes 
> reanimating Groups MUCH easier when you have deleted groups & users.
> No more "auth restore two times" necessary. (Haven't seen it?  Do an 
> auth restore on a group on an SP1 DC and find the LDIF file it
> creates!!)
>
>         d) that leaves only really nasty deletions (e.g. an entire 
> OU), which, given a & b, will probably never happen.
> And when they do, an auth restore on a lag site takes a very short 
> time.
>
>         e) therefore, I save my IT budget and use the $ on tools to 
> aid provisioning, auditing & monitoring, again to avoid problems in 
> the first place.
>
> 2) PREVENTING AUTHENTICATION ON LAG SITE.  As I mentioned, the method 
> I've heard of, and that we're testing, is to stop the NetLogon service

> on the lag DCs.  There are several ways to avoid it restarting when/if

> the DC is rebooted.  The article referenced in the ORIGINAL post 
> suggested modifying which SRV records are registered.  This should 
> work, I'd guess, and is more elegant.  The trick is that SRV records 
> are not registered.  The A records still are, so DCs should be able to

> find each other and replicate successfully, but clients won't 'see'
> the DCs as a viable authentication option.  I've not tried that 
> approach but it sounded really good.
>
> 3) OK, three notes.  LAG SITES can be done with DCs in a site with a 
> long replication interval, or by changing the replication WINDOW 
> (schedule).  It's a good idea to have TWO lag sites on alternating 
> frequencies, to avoid a situation where something awful happens just 
> before a lag site happens to replicate.  Someone detailed this 
> earlier, and it's a good note!
>
> Dan
>  
>
> 
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Myrick, Todd
> (NIH/CC/DNA)
>
> Sent: Thursday, May 19, 2005 6:34 AM
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site----Why?
>
> Is it cheaper and more efficient to go the replication lag site route 
> than buy a proper backup and object level restore solution?
>
> I mean not to toot a vendor's horn, but Quest recovery manager turns 
> the process of restoring objects into a 15 minute click click 
> operation.  I would hate to think of the number of steps you all must 
> do to reanimate the object in a directory using the "Recovery Site".
>
> >From a operations standpoint, there is no substitute for a proper 
> >backup
> solution and object level restore utility for AD.
>
> Thanks,
>
> Todd Myrick
>
> -----Original Message-----
> From: TIROA YANN [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 19, 2005 4:20 AM
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site
>
> Neil,
>
> I now understand... I'm a new man by now thanks to the mysterious lag 
> site that have been revealed to me :-))
>
> Thanks a lot for your explanations.
>
> Cordialement,
>
> Yann TIROA
>
> Centre de Ressources Informatique.
> Campus Scientifique de la DOUA.
> B�t. Gabriel Lippmann - 2 �me �tage - salle 238.
> 43, Bd du 11 Novembre 1918.
> 69622 Villeurbanne Cedex.
>
> 
>
> -----Message d'origine-----
> De : [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] De la part de Ruston, Neil

> Envoy� : jeudi 19 mai 2005 10:09 � :
> '[email protected]'
> Objet : RE: [ActiveDir] AD DR - replication lag site
>
> If the deletion occurs on DC1, then a DC (DC2) in the lag site will 
> not receive the deletion immediately. You therefore have a window of 
> opportunity in which the deletion may be 'undone'.
>
> The deleted object may be auth restored on DC2 and thus replicated / 
> reanimated on DC1 (and any other DC which has received the deletion).
>
> [My terminology may not be acceptable to some - I have deliberately 
> explained this in simplistic terms :)]
>
> neil
>
> 
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of TIROA YANN
> Sent: 19 May 2005 08:54
> To: [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site
>
> 
>
> Hello,
>
> I must apologize, but i'm a little bit confused. You said "With a lag 
> site, you ONLY have to do an authoritative restore (NTDSUTIL)".
>
> Do you mean if i delete my OU in DC in site A, all i have to do is do 
> an autoritative restore, not on site A, BUT on DC on lag site, reboot,

> and dforce replication to site A ? And the non-autoritative restore 
> will be in fact the data on the lag site, that explain your pr�vious 
> sentence ? Waou!
> That's very celver !!
>
> Am I right ?
>
> Regards,
>
> Yann
>
> 
>
> -----Message d'origine-----
> De : [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] De la part de Dan Holme 
> Envoy� :
> jeudi 19 mai 2005 08:51 � : [email protected] Objet : RE:
> [ActiveDir] AD DR - replication lag site
>
> The major issue is the SPEED of recovery.  With a lag site, you ONLY 
> have to do an authoritative restore (NTDSUTIL).
>
> Without a lag site, you must first restore the AD from backup tape 
> ('normal'
> restore), which can take quite some time!!!! Then, and only then, can 
> you do the auth restore.
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of TIROA YANN
> Sent: Wednesday, May 18, 2005 11:46 PM
> To: [EMAIL PROTECTED]; [email protected]
> Subject: RE: [ActiveDir] AD DR - replication lag site
>
> Hello,
>
> Thanks for this interesting tips, but i didn't really understand the 
> "behind the techno"  of a lag site in case of just a deletion of an 
> entire OU with many objects.
>
> For example,if I have AD 2003 domain with 2 sites:
> Site A has 2 DCs
> Site B has one DC and is the lag site
> Between 2 sites, i scheduled repl to appear every 1 week.
>
> In the situation of an OU deletion, i go to the DC i have made the 
> deletion, and do an autoritative restore in dsmode and after rebbot, 
> wait for replication to take place in order to repopulate all my 
> domain with my OU restored. So what will the lag site help me in this 
> situation ?
>
> I can understand that a lag site will help me if all my DCs in site A 
> crashed. So i would take all informations from the lag site to be 
> restored in site A such as "copy" my domain from the lag site by doing

> a dcpromo /adv, and go my freshly installed DCs on site A, and 
> restored my whole domain.
> However, I think i will have more updated information by restoring 
> from my yerterday backup than from the lag site...
>
> So, could you help me better understand the behind the techno of a lag

> site, i thing i misunderstand something important ;-(
>
> Thank you for your feedback.
>
> Have a nice day :-)
>
> Regards,
>
> Yann
>
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
>
> ==============================================================
> ==============
> ==
> This message is for the sole use of the intended recipient.
> If you received this message in error please delete it and notify us.
> If this message was misdirected, Credit Suisse, its subsidiaries and 
> affiliates (CS) do not waive any confidentiality or privilege. CS 
> retains and monitors electronic communications sent through its 
> network.
> Instructions transmitted over this system are not binding on CS until 
> they are confirmed by us. Message transmission is not guaranteed to be

> secure.
> ==============================================================
> ==============
> ==
>
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive:
> http://www.mail-archive.com/activedir%40mail.activedir.org/
>
>

List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive:
http://www.mail-archive.com/activedir%40mail.activedir.org/

List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive:
http://www.mail-archive.com/activedir%40mail.activedir.org/

List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive:
http://www.mail-archive.com/activedir%40mail.activedir.org/

List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive:
http://www.mail-archive.com/activedir%40mail.activedir.org/

This e-mail and any attachment is for authorised use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be
copied,
disclosed to, retained or used by, any other party. If you are not an
intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender. Thank you.

List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive:
http://www.mail-archive.com/activedir%40mail.activedir.org/

This e-mail and any attachment is for authorised use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be copied,
disclosed to, retained or used by, any other party. If you are not an
intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender. Thank you.
List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

RE: [ActiveDir] AD DR - replication lag site----Why?

Reply via email to