> when you've inherited a forest with few domains, 
> what would you check in the first place to make sure, 
> things are running as they should?


I must be weird, given the circumstance (walking in the door on an unknown)
I would tackle this completely differently than the other posters have
mentioned.

I would send an email to a couple of systems people that were already
running it and ask, what issues have they been seeing/have seen? Get all
documentation they have for the environment configuration that they were
aiming for.

I would create a new object in every partition (excluding schema) and in
each sysvol and in WINS (dynamic entry please, not static) and then let it
all replicate.

I would look at the configuration container and the sites / site link layout
to ascertain what the replication topology and theoretical latencies should
be. Looking for any oddball things like weird replication timing, bad
schedules, site link bridges, etc. The schedules would probably be the
biggest pain as I have nothing to decode those currently from the command
line but could probably tie perl and adfind together pretty quickly to do
so. If it was small enough I would just eyeball the outputs or actually use
the GUI.

Then later (not real later, depends on site topology) go looking for all of
my objects I created and for the AD objects the whenchanged attribute so I
can compare against the theoretical latencies. This will test the LDAP
access to every DC making sure they are responding properly. Initially I
assumed that but then figured I should document it so it is obvious this is
checking another aspect of the health. Also query every WINS Server for the
record I added with both netsh and NBLOOKUP/NMBLOOKUP (one is a Samba port
to Win32 and one is a Microsoft supplied tool) to test functionality on both
WINS interfaces. 

I would run a little tool I call OOR against all DCs. It initially stood for
out of resources which was a huge issue I walked into once when I walked in
on an unknown environment. A good 80+ DCs were all reporting out of
resources when trying to do NET API type calls against them. The oor tool is
a simple perl script that loops through a domain and does a GetUserInfo for
the guest account against all DCs. 

This gets you a good solid baseline on how things are really working versus
grabbing a ton of info and munging through reports.

Any DC that didn't get the information I focus on looking at replication
info and if necessary chasing into FRS or DNS as necessary.


That would be my first place stuff... After that, then I would dive into the
rest of this. 


After I know everything is basically functioning as expected, you have time
to look at the more detailed stuff that lets you know things that aren't
stopping functionality but could be impacting it for performance, etc. This
would be to do the intensive check every dc for every error, check all of
dns, check all of frs, etc that the diag tools do. I would also start
monitoring the DRA Pending queue of each DC on a 2-3 minute swing to see if
I have any serious bottle necks there that could be slowing things down.

What SP and hotfix levels are the DCs at? What functionality modes are you
in? Do you have enough GC coverage for the mode? I.E. If mixed mode you can
have one level of coverage, for native mode you may need more coverage.

Now the real hard work begins... Finding out what AD permissions have been
delegated and to whom and do they make sense? Do you have any serious
security holes because of it. 

Audit all of the computer accounts and remove stale ones. Ditto for user
accounts (maybe look at using oldcmp for BOTH of those things, yes it will
do user accounts to if you use the -f option and specify a filter that picks
out users. 

Audit the WINS records, any statics, they still needed? If not remove them. 

Ditto for DNS. 

What is the group strategy? How are they using them? For DLs? For security?
The old legacy UGLY method or something a little more updated? Heavy use of
GGs? Why? Doing role based stuff? Heavy use of UNIs? Do you have the proper
GC coverage for them?

Try to figure out which groups are and aren't being used. Try to clean that
up, if there aren't owners for all of the groups, find someone to own them.
Every object in AD should have an owner, if you can't find one, you as
Enterprise Admin now own it, do you personally need it? No, disable it. For
a group you can disable a security group by making it a DL, it will keep the
SID so if you need it back, it just won't work as a security group anymore.
You can reenable it as a security group if you find it is indeed needed and
you have a new owner for it. 

Look at the naming standards. There better be some. If not, set some. If
there are some, do they make sense or are they ad hoc? Naming standard
should exist for servers, clients, groups, OUs, sites, sitelinks. 

I have outlined 1-24 months of work here depending on how large the
environment, what tools are available, what problems exist, what work load
there is outside of this. I would make sure there was netmon or ethereal
available on every DC and any other server I supported and start doing basic
network traces of each of the subnets that the machines are on. 



Oh if there is Exchange in there that adds a bunch more and needs to be kept
in mind through the whole thing.


Now if in my next job I start doing hot dropping into sites like this
describes, I would probably write a bunch of tools to help this process out
because it would be seriously a mishmosh of different things at the moment.
Heck just having a tool to run on a network and get a good understanding of
what is there would be nice, though difficult. 



   joe

 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Svetlana
Kouznetsova
Sent: Friday, June 04, 2004 10:53 AM
To: [EMAIL PROTECTED]
Subject: [ActiveDir] AD Health check


Hi,
In my quest to solve various problems in our forest while promoting W2K3 DC,
I've now come to the point when I want to ascertain overall current
situation in my AD and I need more general advice on :
What kind of tests one should do for checking the health of AD (W2K native
mode). As far as I can see, there are no certain compulsory things you need
to run in your AD from time to time - it all depends on time, skills and
perhaps, one's wish as well.

But maybe people can share their experience - when you've inherited a forest
with few domains, what would you check in the first place to make sure,
things are running as they should?

I can think of the basics, like 

Obvious event logs, dcdiag and netdiag
netdiag /debug /v - for basically, everything ?
dcdiag /test:fsmocheck - to test for all global role-holders are known and
responding dcdiag /test:frssysvol - to test frs dcdiag /test:registerindns
/dnsdomain:domain - to test, if DC can register DC Locator DNS records
nltest/dclist:domain_name - to see if DC can see the rest of the forest
nltest /dsgetdc:domain_name /gc  - to see if DC can see GC  servers in the
forest nslookup -d - for testing DNS queries repadmin /bind
servername.domain - to test if DC can bind to others for replication. 

Perhaps, some of them are overkill, but I'm looking for a bit  more, then
just routine checkup.

Can you comment, please?

Thanks in advance
Lana.

List info   : http://www.activedir.org/mail_list.htm
List FAQ    : http://www.activedir.org/list_faq.htm
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

List info   : http://www.activedir.org/mail_list.htm
List FAQ    : http://www.activedir.org/list_faq.htm
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

Reply via email to