RE: [ActiveDir] Raid 1 tangent -- Vendor Domain

joe Sat, 22 Jul 2006 19:13:30 -0700

That's a command line guy for you... 

:o)


The thing is that I type in a very odd way two, my whole right hand just one
or two fingers from my left hand. People tend to get a bit confused when
they see me type. 

 joe


--
O'Reilly Active Directory Third Edition -
http://www.joeware.net/win/ad3e.htm 
 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kevin Gent
Sent: Saturday, July 22, 2006 7:29 PM
To: [email protected]
Subject: Re: [ActiveDir] Raid 1 tangent -- Vendor Domain

joe,

you must type really, really fast............

----- Original Message ----- 
From: "Albert Duro" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Saturday, July 22, 2006 7:06 PM
Subject: Re: [ActiveDir] Raid 1 tangent -- Vendor Domain


> no debate from me.  I was just asking.  Thank you for the lesson.
>
> ----- Original Message ----- 
> From: "joe" <[EMAIL PROTECTED]>
> To: <[email protected]>
> Sent: Saturday, July 22, 2006 9:48 AM
> Subject: RE: [ActiveDir] Raid 1 tangent -- Vendor Domain
>
>
>> Mirrors don't scale.
>>
>> Microsoft's deployment doc mostly just talks about using mirrors (small 
>> nod
>> to RAID 10/0+1) so everyone thinks that they should build their Corporate
>> DCs on mirrors, usually 3 - OS, Logs, and DIT. Very few people if anyone
>> would build a corporate Exchange Server on mirrors... Why not? The DB is 
>> the
>> same under both of them... What is critical to Exchange? IOPS and that 
>> means
>> spindles. If something is really beating on AD and the entire DIT can't 
>> be
>> cached, IOPS are critical to AD as well. The main difference is that AD 
>> is
>> mostly random read and Exchange is heavy writing and reading. The 
>> exception
>> to this is the edge case of Eric's big DIT[1] in which he dumped 2TB of 
>> data
>> into AD in a month at which point he did something that few people see,
>> pushed the IOPS on the log drive through the roof.
>>
>> In a smaller environment (very low thousands), or for a low use DC (small
>> WAN site), or a DC with a DIT fully cached a RAID-1 drive for DIT will
>> probably be sufficient, you will note that the only numbers mentioned in 
>> the
>> deployment guide are about 5000[2]... That usually means a small DIT and 
>> it
>> is extremely likely that a K3 DC will cache the entire DIT. Plus the 
>> usage
>> is probably such that the IO capability of two spindles will likely be 
>> ok.
>> Let me state though that even in a small user environment if there was an
>> intensive directory based app or a buttload of data that pushes the DIT 
>> into
>> GB's instead of MBs I would still be watching my disk queueing pretty 
>> close
>> as well as the Read and Write Ops.
>>
>> AD admins who aren't running directory intensive apps (read as Exchange
>> 2000+) usually don't see any issues but then again most aren't looking 
>> very
>> closely at the counters because they haven't had a reason too and even if
>> they had some short lived issues they probably wouldn't go look at the
>> counters. At least that has been my experience in dealing with companies.

>> I
>> will admit that prior to implementing Exchange when I did AD Ops with a
>> rather large company I didn't once look at the disk counters, didn't 
>> care,
>> everything ran perfectly well and about the only measure of perf was
>> replication latency and does ADUC start fast enough and it always was 
>> fine
>> there unless there were network related issues or a DC was having 
>> hardware
>> failure.
>>
>> Enter Exchange... Or some other app that pounds your DCs with millions of
>> queries a day and tiny little bits of latency that you didn't previously
>> feel start having an impact. You won't feel 70-80ms of latency in 
>> anything
>> you are doing with normal AD tools or NOS ops, not at all. You will feel
>> that with Exchange (and other heavy directory use apps), often with 
>> painful
>> results unless it isn't consistent and the directory can unwind itself 
>> again
>> and hence allow Exchange to then unwind itself.
>>
>> Now let me point out, I don't deal with tiny companies for work, small to

>> me
>> is less than 40-50k. The smallest I tend to deal with is about 30k. I
>> usually get called to walk in to Exchange issues where Exchange is
>> underperforming or outright hanging, sometimes for hours at a time. There
>> can be all sorts of issues causing this such as
>>
>> O poor disk subsystem design for Exchange (someone say got fancy with a 
>> SAN
>> layout and really didn't know what they were doing seems to be popular 
>> here)
>>
>>
>> O hardware/drivers on the Exchange server just aren't working properly 
>> and
>> the drivers are experiencing timeout issues (for some reason I want to 
>> say
>> HBA here)
>>
>> O poor network configurations and odd load balancing solutions, etc that
>> generate a whole bunch of say keep alive traffic on the segment that no 
>> one
>> had any idea about because no one understood the solution nor took time 
>> to
>> look at the network traces. Or maybe
>> the infamous Full/100 on one end and half/100 on the other. Whatever.
>>
>> O Applications that beat the crap out of Exchange that weren't accounted 
>> for
>> in the design well or at all... such as Blackberry or Desktop Search or
>> various Archive solutions
>>
>> O Poorly written event sinks, disclaimer type products that query AD
>> themselves for additional info fit nicely into this category (hint do not
>> deploy one of these unless you understand the queries it generates)
>>
>> O DCs being too far away say like an Exchange server in the US hosting 
>> APAC
>> users. If you are running Exchange, you put Exchange and the DCs for the
>> domains of any users on that Exchange server on the same physical subnet.
>> And if you have a multidomain forest, strongly consider shortcut trusts
>> between the domains that the Exchange servers are in with the domains the
>> users are in.
>>
>> O DCs underperforming
>>
>> The last is almost always, heck, I will say in 98% of the cases I have 
>> had
>> to investigate, related to DC disk configuration and it is always a 
>> mirrored
>> setup. In fact, almost always it is the deployment guide recommendation 
>> of
>> mirror for OS, mirror for logs, and mirror for DIT. Then you look at the
>> perf and you see that the counters on the DIT disk are in the nose bleed
>> seats and you are getting maybe 150 ops per second through the DIT disk 
>> and
>> counters on the OS and log drives can't even be viewed unless you use the
>> multiplier to boost them in perfmon because they are dead asleep with an
>> occasional bump to let you know they aren't outright dead.
>>
>> The logic is fairly sound if you don't probe it too deeply, of course you
>> want the OS by itself because you don't want it impacting the directory 
>> perf
>> and you don't want directory perf impacting the OS. The logs are 
>> sequential
>> and the DIT is random so you don't want to mix and match those as you 
>> will
>> impact log perf. But then you look at the counters and again, the OS is
>> sleeping and the Logs are sleeping and the DIT is on fire with no water 
>> in
>> sight. What do you need for the DIT in this condition? Gold star to 
>> whomever
>> said "Available IOPS capability" first... How do you add the capacity for
>> more IOPS? You add spindles. How many IOPS do you need for your DIT? Good
>> question, I have never seen a document that starts to help you guess that

>> as
>> profiling DC usage is tough. Probably tougher than profiling Exchange 
>> usage
>> where you do hear a lot about how many IOPS capability you need. If you 
>> want
>> a nice baseline of how many you need, start with as many you can freakin 
>> get
>> in the box you have available. I.E. Every slot you have for a disk you 
>> put a
>> disk into and give that to the DIT drive, the OS and Logs fit in wherever
>> there is room and they don't get dedicated slots. You will not be 
>> penalized
>> for having too much capacity for reading the DIT.
>>
>> So then I spend 1 day to 3 weeks trying to convince the folks that AD is
>> causing an issue even though LDP, ADSIEDIT, etc[3] fires up properly and
>> seemingly quickly and people assure me that before Exchange came around
>> everything worked great and everyone was happy so obviously the DCs are 
>> fine
>> and it is Exchange that is the problem. If I can't prove things with the
>> counters I usually have to prove it with a little script I have that 
>> sends
>> queries to the DCs in a couple of sites (some with Exchange and some
>> without) every 1-5 minutes and generates a little simple graph showing 
>> the
>> response times. Currently this only has a resolution of seconds because 
>> it
>> requires spinning up an outside executable and perl does seconds easily 
>> for
>> the timers. However, it is usually quite rare that I don't have a graph 
>> at
>> the end of the week that helps me determine the usual interval for online
>> defrag for each DC as well as when the users are logging into Exchange. 
>> For
>> the most part the non-Exchange DCs are all showing response times in that
>> graph of 1-2 seconds (again recall the resolution is seconds, the actual
>> responses to the multiple queries are subsecond) and the Exchange servers
>> will be 1-2 seconds except in the mornings (or during heavy DL periods) 
>> at
>> which point I have seen timings of 4,5,6,7 seconds and sometimes as bad 
>> as
>> 15,20,30 seconds. Let me put it this way, if it take a couple of seconds 
>> to
>> return a simple query of a couple of attributes of your schema... There 
>> is
>> an issue regardless of whether your NOS users feel it or not or if some
>> admin tool works ok.
>>
>> So finally someone says, what can we do? I say, rebuild the disk array 
>> with
>> a single RAID 10 or 0+1, you pick, I don't care about anything other than
>> the perf and they are identical, you can argue out the redundancy points
>> amongst yourselves. If that isn't an option, I say use RAID-5. Anything 
>> that
>> throws multiple spindles at the DIT. I lump it all together, OS, DIT, and
>> Logs. There is no reason you should be protecting your OS and Logs such 
>> that
>> they are sleeping while the DIT is burning. If a DC isn't running AD very
>> well, I don't care if the OS is running well, it is a moot point. As for 
>> the
>> logs... They are a rounding error unless you are like Eric and really 
>> like
>> playing with your DIT by pounding it with writes.
>>
>> Between RAID 10/0+1 and 5, from the numbers I have seen, 10/0+1 tends to
>> enjoy somewhere in the area of a 2-10% perf for available OPS with the 
>> same
>> number of spindles used. Usually what you see though is that you have say

>> a
>> machine with 6 disk capability and you will see a 5+1 RAID-5 (+1 is the
>> hotspare) or a 4+2 RAID-10/0+1. Right off the RAID-5 has the benefit of
>> having an additional spindle over the RAID-10/0+1 configuration so it 
>> should
>> outperform the RAID-10/0+1. Me, for a staffed class-A datacenter with a 6
>> disk internal capability I would run a 6 disk RAID-10/0+1 then if that
>> wasn't ok a 6 disk RAID-5. Hot spares are for sites where you have no 
>> clue
>> how long it will take to get someone in to change the disk. If you have a
>> staffed datacenter it shouldn't take more than 60 minutes to get a disk
>> swapped, really it shouldn't be but 10-20 minutes. That is what all that
>> monitoring and 24x7x365.25 staff is about.
>>
>> Oh... One more thing before I wrap this... You don't get perf gain from
>> logically partitioning a single RAID array. I have seen deployments where
>> they actually went with a multiple spindle disk configuration and then 
>> broke
>> the OS, Logs, and DIT up into different volumes within the OS... OS I am
>> fine with, it is a nice mental breakout of that aspect, but the points in
>> separating the LOGs and the DIT aren't that great that I am aware of 
>> unless
>> you expect to run your DIT out of space and you really shouldn't be 
>> thinking
>> about doing that (again monitoring but also protecting your directory 
>> from
>> letting people add things unhindered). Certainly breaking things out by
>> volume isn't a perf gain and personally I think it adds to the design
>> complexity needlessly.
>>
>> So if your DIT is under 1.5GB and you have the RAM to cache that DIT on 
>> K3
>> AD then a mirror will probably be fine for you. If the DIT is under, what

>> is
>> it about, 2.7 or so GB, and you have the RAM and /3GB on K3 AD enabled 
>> then
>> a mirror will probably be fine for you. If you have a WAN site that has 
>> some
>> basic users logging on and getting GPOs and accessing file shares 
>> locally, a
>> mirror will probably be fine for you. If you are just doing NOS stuff 
>> then a
>> mirror may be fine for you even in real large orgs. If you are outside of
>> that criteria, think hard about whether a mirror is right for you and 
>> prove
>> that out by watching the disk counters. If you have Exchange beating 
>> against
>> your AD and it can't be cached, a mirror is most likely not going to be 
>> as
>> performant as it should be for *optimal* Exchange performance.
>>
>> I say optimal because Exchange may appear to be fine but as I often tell
>> people, Exchange will put up with a lot of stupid things until it hits 
>> the
>> limit and then it will throw a fit and blow out completely on you and you
>> have to chase through and figure out out of all the stupid things you are
>> doing, which one is the one pushing it over the edge this time so you can
>> fix it (reminds me of some relationships I know of with girls taking on 
>> the
>> part of Exchange and guys taking on the part of doing lots of stupid
>> things<eg>).
>>
>> I don't have a lot of experience yet with x64 DCs but my gut says that
>> assuming you have enough RAM to cache the entire DIT and you aren't
>> constantly rebooting the DC or doing things that force the cache to be
>> trimmed, the disk subsystem is really only going to be important for 
>> writes
>> (which we have already said aren't really all that much of what AD is 
>> doing)
>> and the initial caching of the DIT.
>>
>> Let the debates begin. :)
>>
>>
>>  joe
>>
>>
>>
>>
>>
>> [1] http://blogs.technet.com/efleis/archive/2006/06/08/434255.aspx
>>
>> [2] BTW, I read that 5000 as total users using AD, not users using that 
>> one
>> DC. The more users you have, the more likely your DIT is going to hit a 
>> size
>> that can't be cached.
>>
>> [3] Even in one case adfind was used to prove AD was fine and the person
>> didn't know I wrote it... That was an interesting conversation as the 
>> person
>> tried to explain to me how ADFIND worked and then I explained he was 
>> wrong
>> and laid out the actual algorithm for what it was doing and he said I was
>> wrong and I said I hope not, I wrote it.
>>
>>
>> --
>> O'Reilly Active Directory Third Edition -
>> http://www.joeware.net/win/ad3e.htm
>>
>>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of
>> [EMAIL PROTECTED]
>> Sent: Saturday, July 22, 2006 11:06 AM
>> To: [email protected]
>> Subject: RE: [ActiveDir] Raid 1 tangent -- Vendor Domain
>>
>> "- stop using mirrors damnit) ."[1]
>>
>>
>> can you please explain that?  What's wrong with mirrors?
>>
>> [1] joe, speaking particularly in the context of Exchange
>> List info   : http://www.activedir.org/List.aspx
>> List FAQ    : http://www.activedir.org/ListFAQ.aspx
>> List archive: http://www.activedir.org/ml/threads.aspx
>>
>> List info   : http://www.activedir.org/List.aspx
>> List FAQ    : http://www.activedir.org/ListFAQ.aspx
>> List archive: http://www.activedir.org/ml/threads.aspx
>>
>
>
> List info   : http://www.activedir.org/List.aspx
> List FAQ    : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.activedir.org/ml/threads.aspx
> 


List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.activedir.org/ml/threads.aspx

List info   : http://www.activedir.org/List.aspx
List FAQ    : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.activedir.org/ml/threads.aspx

RE: [ActiveDir] Raid 1 tangent -- Vendor Domain

Reply via email to