That's a command line guy for you... :o)
The thing is that I type in a very odd way two, my whole right hand just one or two fingers from my left hand. People tend to get a bit confused when they see me type. joe -- O'Reilly Active Directory Third Edition - http://www.joeware.net/win/ad3e.htm -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kevin Gent Sent: Saturday, July 22, 2006 7:29 PM To: [email protected] Subject: Re: [ActiveDir] Raid 1 tangent -- Vendor Domain joe, you must type really, really fast............ ----- Original Message ----- From: "Albert Duro" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Saturday, July 22, 2006 7:06 PM Subject: Re: [ActiveDir] Raid 1 tangent -- Vendor Domain > no debate from me. I was just asking. Thank you for the lesson. > > ----- Original Message ----- > From: "joe" <[EMAIL PROTECTED]> > To: <[email protected]> > Sent: Saturday, July 22, 2006 9:48 AM > Subject: RE: [ActiveDir] Raid 1 tangent -- Vendor Domain > > >> Mirrors don't scale. >> >> Microsoft's deployment doc mostly just talks about using mirrors (small >> nod >> to RAID 10/0+1) so everyone thinks that they should build their Corporate >> DCs on mirrors, usually 3 - OS, Logs, and DIT. Very few people if anyone >> would build a corporate Exchange Server on mirrors... Why not? The DB is >> the >> same under both of them... What is critical to Exchange? IOPS and that >> means >> spindles. If something is really beating on AD and the entire DIT can't >> be >> cached, IOPS are critical to AD as well. The main difference is that AD >> is >> mostly random read and Exchange is heavy writing and reading. The >> exception >> to this is the edge case of Eric's big DIT[1] in which he dumped 2TB of >> data >> into AD in a month at which point he did something that few people see, >> pushed the IOPS on the log drive through the roof. >> >> In a smaller environment (very low thousands), or for a low use DC (small >> WAN site), or a DC with a DIT fully cached a RAID-1 drive for DIT will >> probably be sufficient, you will note that the only numbers mentioned in >> the >> deployment guide are about 5000[2]... That usually means a small DIT and >> it >> is extremely likely that a K3 DC will cache the entire DIT. Plus the >> usage >> is probably such that the IO capability of two spindles will likely be >> ok. >> Let me state though that even in a small user environment if there was an >> intensive directory based app or a buttload of data that pushes the DIT >> into >> GB's instead of MBs I would still be watching my disk queueing pretty >> close >> as well as the Read and Write Ops. >> >> AD admins who aren't running directory intensive apps (read as Exchange >> 2000+) usually don't see any issues but then again most aren't looking >> very >> closely at the counters because they haven't had a reason too and even if >> they had some short lived issues they probably wouldn't go look at the >> counters. At least that has been my experience in dealing with companies. >> I >> will admit that prior to implementing Exchange when I did AD Ops with a >> rather large company I didn't once look at the disk counters, didn't >> care, >> everything ran perfectly well and about the only measure of perf was >> replication latency and does ADUC start fast enough and it always was >> fine >> there unless there were network related issues or a DC was having >> hardware >> failure. >> >> Enter Exchange... Or some other app that pounds your DCs with millions of >> queries a day and tiny little bits of latency that you didn't previously >> feel start having an impact. You won't feel 70-80ms of latency in >> anything >> you are doing with normal AD tools or NOS ops, not at all. You will feel >> that with Exchange (and other heavy directory use apps), often with >> painful >> results unless it isn't consistent and the directory can unwind itself >> again >> and hence allow Exchange to then unwind itself. >> >> Now let me point out, I don't deal with tiny companies for work, small to >> me >> is less than 40-50k. The smallest I tend to deal with is about 30k. I >> usually get called to walk in to Exchange issues where Exchange is >> underperforming or outright hanging, sometimes for hours at a time. There >> can be all sorts of issues causing this such as >> >> O poor disk subsystem design for Exchange (someone say got fancy with a >> SAN >> layout and really didn't know what they were doing seems to be popular >> here) >> >> >> O hardware/drivers on the Exchange server just aren't working properly >> and >> the drivers are experiencing timeout issues (for some reason I want to >> say >> HBA here) >> >> O poor network configurations and odd load balancing solutions, etc that >> generate a whole bunch of say keep alive traffic on the segment that no >> one >> had any idea about because no one understood the solution nor took time >> to >> look at the network traces. Or maybe >> the infamous Full/100 on one end and half/100 on the other. Whatever. >> >> O Applications that beat the crap out of Exchange that weren't accounted >> for >> in the design well or at all... such as Blackberry or Desktop Search or >> various Archive solutions >> >> O Poorly written event sinks, disclaimer type products that query AD >> themselves for additional info fit nicely into this category (hint do not >> deploy one of these unless you understand the queries it generates) >> >> O DCs being too far away say like an Exchange server in the US hosting >> APAC >> users. If you are running Exchange, you put Exchange and the DCs for the >> domains of any users on that Exchange server on the same physical subnet. >> And if you have a multidomain forest, strongly consider shortcut trusts >> between the domains that the Exchange servers are in with the domains the >> users are in. >> >> O DCs underperforming >> >> The last is almost always, heck, I will say in 98% of the cases I have >> had >> to investigate, related to DC disk configuration and it is always a >> mirrored >> setup. In fact, almost always it is the deployment guide recommendation >> of >> mirror for OS, mirror for logs, and mirror for DIT. Then you look at the >> perf and you see that the counters on the DIT disk are in the nose bleed >> seats and you are getting maybe 150 ops per second through the DIT disk >> and >> counters on the OS and log drives can't even be viewed unless you use the >> multiplier to boost them in perfmon because they are dead asleep with an >> occasional bump to let you know they aren't outright dead. >> >> The logic is fairly sound if you don't probe it too deeply, of course you >> want the OS by itself because you don't want it impacting the directory >> perf >> and you don't want directory perf impacting the OS. The logs are >> sequential >> and the DIT is random so you don't want to mix and match those as you >> will >> impact log perf. But then you look at the counters and again, the OS is >> sleeping and the Logs are sleeping and the DIT is on fire with no water >> in >> sight. What do you need for the DIT in this condition? Gold star to >> whomever >> said "Available IOPS capability" first... How do you add the capacity for >> more IOPS? You add spindles. How many IOPS do you need for your DIT? Good >> question, I have never seen a document that starts to help you guess that >> as >> profiling DC usage is tough. Probably tougher than profiling Exchange >> usage >> where you do hear a lot about how many IOPS capability you need. If you >> want >> a nice baseline of how many you need, start with as many you can freakin >> get >> in the box you have available. I.E. Every slot you have for a disk you >> put a >> disk into and give that to the DIT drive, the OS and Logs fit in wherever >> there is room and they don't get dedicated slots. You will not be >> penalized >> for having too much capacity for reading the DIT. >> >> So then I spend 1 day to 3 weeks trying to convince the folks that AD is >> causing an issue even though LDP, ADSIEDIT, etc[3] fires up properly and >> seemingly quickly and people assure me that before Exchange came around >> everything worked great and everyone was happy so obviously the DCs are >> fine >> and it is Exchange that is the problem. If I can't prove things with the >> counters I usually have to prove it with a little script I have that >> sends >> queries to the DCs in a couple of sites (some with Exchange and some >> without) every 1-5 minutes and generates a little simple graph showing >> the >> response times. Currently this only has a resolution of seconds because >> it >> requires spinning up an outside executable and perl does seconds easily >> for >> the timers. However, it is usually quite rare that I don't have a graph >> at >> the end of the week that helps me determine the usual interval for online >> defrag for each DC as well as when the users are logging into Exchange. >> For >> the most part the non-Exchange DCs are all showing response times in that >> graph of 1-2 seconds (again recall the resolution is seconds, the actual >> responses to the multiple queries are subsecond) and the Exchange servers >> will be 1-2 seconds except in the mornings (or during heavy DL periods) >> at >> which point I have seen timings of 4,5,6,7 seconds and sometimes as bad >> as >> 15,20,30 seconds. Let me put it this way, if it take a couple of seconds >> to >> return a simple query of a couple of attributes of your schema... There >> is >> an issue regardless of whether your NOS users feel it or not or if some >> admin tool works ok. >> >> So finally someone says, what can we do? I say, rebuild the disk array >> with >> a single RAID 10 or 0+1, you pick, I don't care about anything other than >> the perf and they are identical, you can argue out the redundancy points >> amongst yourselves. If that isn't an option, I say use RAID-5. Anything >> that >> throws multiple spindles at the DIT. I lump it all together, OS, DIT, and >> Logs. There is no reason you should be protecting your OS and Logs such >> that >> they are sleeping while the DIT is burning. If a DC isn't running AD very >> well, I don't care if the OS is running well, it is a moot point. As for >> the >> logs... They are a rounding error unless you are like Eric and really >> like >> playing with your DIT by pounding it with writes. >> >> Between RAID 10/0+1 and 5, from the numbers I have seen, 10/0+1 tends to >> enjoy somewhere in the area of a 2-10% perf for available OPS with the >> same >> number of spindles used. Usually what you see though is that you have say >> a >> machine with 6 disk capability and you will see a 5+1 RAID-5 (+1 is the >> hotspare) or a 4+2 RAID-10/0+1. Right off the RAID-5 has the benefit of >> having an additional spindle over the RAID-10/0+1 configuration so it >> should >> outperform the RAID-10/0+1. Me, for a staffed class-A datacenter with a 6 >> disk internal capability I would run a 6 disk RAID-10/0+1 then if that >> wasn't ok a 6 disk RAID-5. Hot spares are for sites where you have no >> clue >> how long it will take to get someone in to change the disk. If you have a >> staffed datacenter it shouldn't take more than 60 minutes to get a disk >> swapped, really it shouldn't be but 10-20 minutes. That is what all that >> monitoring and 24x7x365.25 staff is about. >> >> Oh... One more thing before I wrap this... You don't get perf gain from >> logically partitioning a single RAID array. I have seen deployments where >> they actually went with a multiple spindle disk configuration and then >> broke >> the OS, Logs, and DIT up into different volumes within the OS... OS I am >> fine with, it is a nice mental breakout of that aspect, but the points in >> separating the LOGs and the DIT aren't that great that I am aware of >> unless >> you expect to run your DIT out of space and you really shouldn't be >> thinking >> about doing that (again monitoring but also protecting your directory >> from >> letting people add things unhindered). Certainly breaking things out by >> volume isn't a perf gain and personally I think it adds to the design >> complexity needlessly. >> >> So if your DIT is under 1.5GB and you have the RAM to cache that DIT on >> K3 >> AD then a mirror will probably be fine for you. If the DIT is under, what >> is >> it about, 2.7 or so GB, and you have the RAM and /3GB on K3 AD enabled >> then >> a mirror will probably be fine for you. If you have a WAN site that has >> some >> basic users logging on and getting GPOs and accessing file shares >> locally, a >> mirror will probably be fine for you. If you are just doing NOS stuff >> then a >> mirror may be fine for you even in real large orgs. If you are outside of >> that criteria, think hard about whether a mirror is right for you and >> prove >> that out by watching the disk counters. If you have Exchange beating >> against >> your AD and it can't be cached, a mirror is most likely not going to be >> as >> performant as it should be for *optimal* Exchange performance. >> >> I say optimal because Exchange may appear to be fine but as I often tell >> people, Exchange will put up with a lot of stupid things until it hits >> the >> limit and then it will throw a fit and blow out completely on you and you >> have to chase through and figure out out of all the stupid things you are >> doing, which one is the one pushing it over the edge this time so you can >> fix it (reminds me of some relationships I know of with girls taking on >> the >> part of Exchange and guys taking on the part of doing lots of stupid >> things<eg>). >> >> I don't have a lot of experience yet with x64 DCs but my gut says that >> assuming you have enough RAM to cache the entire DIT and you aren't >> constantly rebooting the DC or doing things that force the cache to be >> trimmed, the disk subsystem is really only going to be important for >> writes >> (which we have already said aren't really all that much of what AD is >> doing) >> and the initial caching of the DIT. >> >> Let the debates begin. :) >> >> >> joe >> >> >> >> >> >> [1] http://blogs.technet.com/efleis/archive/2006/06/08/434255.aspx >> >> [2] BTW, I read that 5000 as total users using AD, not users using that >> one >> DC. The more users you have, the more likely your DIT is going to hit a >> size >> that can't be cached. >> >> [3] Even in one case adfind was used to prove AD was fine and the person >> didn't know I wrote it... That was an interesting conversation as the >> person >> tried to explain to me how ADFIND worked and then I explained he was >> wrong >> and laid out the actual algorithm for what it was doing and he said I was >> wrong and I said I hope not, I wrote it. >> >> >> -- >> O'Reilly Active Directory Third Edition - >> http://www.joeware.net/win/ad3e.htm >> >> >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of >> [EMAIL PROTECTED] >> Sent: Saturday, July 22, 2006 11:06 AM >> To: [email protected] >> Subject: RE: [ActiveDir] Raid 1 tangent -- Vendor Domain >> >> "- stop using mirrors damnit) ."[1] >> >> >> can you please explain that? What's wrong with mirrors? >> >> [1] joe, speaking particularly in the context of Exchange >> List info : http://www.activedir.org/List.aspx >> List FAQ : http://www.activedir.org/ListFAQ.aspx >> List archive: http://www.activedir.org/ml/threads.aspx >> >> List info : http://www.activedir.org/List.aspx >> List FAQ : http://www.activedir.org/ListFAQ.aspx >> List archive: http://www.activedir.org/ml/threads.aspx >> > > > List info : http://www.activedir.org/List.aspx > List FAQ : http://www.activedir.org/ListFAQ.aspx > List archive: http://www.activedir.org/ml/threads.aspx > List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.activedir.org/ml/threads.aspx List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.activedir.org/ml/threads.aspx
