Up to this point, all we've talked about really is storing these puppies. For me, the real question is whether all of these user objects can actually be made use of. For example, if you wanted to use these for authentication and authorization, you presumably have to start adding them to groups (unless you think you're going to refer to them individually in an ACL.) That means you have to allow for a certain % of group objects in the DIT to "support" the user objects. Then there are actual servers that these folks would have to connect to in order to actually do anything. Even if you limit yourself to scenarios where you don't have folks actually log onto a server, you will run into any number of practical constraints from other directions.
Granted, this isn't nearly as interesting as the pure theoretical limitation of the technology but it does remind us that we all deploy AD for a myriad of reasons. If the Hippies were successful in lobbying the UN for a user account for every human being (and most great apes), we would probably find that we had to partition well before a billion. Wook -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of joe Sent: Sunday, April 16, 2006 7:04 PM To: [email protected] Subject: RE: [ActiveDir] User Accounts Excellent post Brett, had me laughing and learning all of the way. Even folks who don't understand it should read it IMO, probably twice. Dean cleared me up on the RIDs, sounds like someone decided to artificially limit them to 30 bits (not even 32 or 31 as I surmised) so 1 billion is a good round number to go with - possibly two people left that team previously and both took a bit with them. joe -- O'Reilly Active Directory Third Edition - http://www.joeware.net/win/ad3e.htm -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brett Shirley Sent: Sunday, April 16, 2006 8:47 PM To: [email protected] Subject: RE: [ActiveDir] User Accounts Eric's quoting didn't come across in pine so well, so I've improved it by using ">>" where he was quoting others ... *Ahem* ... for the hex heads ... ESE limits: The underlying store (aka ESE or JET Blue) does not have a 4.2 billion row constraint to the # of rows in a single table ... ESE will support from 2^1 up to 2^(~240*8) rows in a single table, _depending upon your primary key_ ... and if you found ESE's old max 9.95e+583 rows to be woefully under sized, you'll be able to go to around _I think_ 2^(~1875*8) rows in Vista ... if you can find the storage for it [1]. AD design limits: Active Directory however choose a primary key ("The DNT") that has only 32 bits, and is signed, so limiting to positive values is limited to 2.1 billion rows (as ~Eric mentions), but this is not ESE's fault, nor an ESE limitation. Exchange for example choose a 63-bit message ID on thier message table (called "1-23" IIRC), and is thus limited to no more than 2^63 / 9.22 quintillion rows (though probably a bit less due to the way they parse up the message ID). Clearly the Exchange limit of # of message rows, shows that ESE is not limited to 2.1 or 4.2 billion rows in a single table, this is why it is crucial to be able to distinguish how ESE differs from the data layer / schema (of AD) constructed on top of ESE. At this point we think we've established the max # of objects in an AD database, BUT the actual hard limitation would be the minimum of several competing constraints, any which could reduce us far lower ... Actual hard limitation will be the 1. Dean points out over "the lifetime of the database". This is crucial to understand, you should consider his meaning, he is right on about that. This is again an AD limitation, not an ESE limitation though. AD could've concocted (not even that hard) a scheme to reuse rows / DNTs. 2. joe pointed out the 16 TB DB size limit, he is right about that, which means at 2 billion objects, your net aggregate object size cost (including SD which may be single instanced, the link values, the ESE overhead to maintain the database, indices, rows, record format, etc) must be below 8KB / object. This is worth noting because the average size of ONLY the raw data (i.e. excluding ESE overhead) _in the datatable_ of an AD user in our primary corp domains is 11,924 bytes. Dang certs. 3. Eric, also points out about LID (which is a Long-value ID) is a signed int (again 31 bits available in positive value space), so we could be limited to less than 2 billion objects, if each object had a couple "burst long values" (only _burst_ LVs use LIDs). LV = Long-Value, not Link Value for this discussion. This _IS_ an ESE limitation. Expeience tells us replProperlyMetaData and supplementalCredentials on typical AD users are burst, and thus the limit could be as low as 1 billion. 4. SIDs (well RIDs actually) can limit how many security principals you use, but RIDs are a security aspect, and so I have no idea if you can use 32, 31, or less of that number space, I suspect 1 billion but don't know that at all. Anyway along time ago we (some AD people) went through all the various aspects, issues, etc and we came up with "the safe value", that special value we wanted to claim / support ... and we started saying 1 billion was the official limit. I updated the wikipedia topic on it awhile back. The issue joe mentioned with the # of pages in an ESE database being 2^31 ... I like to state it as: "Jordie (my pseudonym for a paticularly talented developer) took away the high bit before he moved off the ESE team, and won't give it back.". <g> That is the funny way to say, paranoia drove one of us to cap it to explicitly positive page numbers. Given that the file system is limited to 16 TBs for a single file for some paticular (?default? 4k? max?) "allocation size", I don't really see this being fixed anytime soon... My confidence ranges from 53% to 72% for all the above info ... I don't give a confidence of more than 80% to anything I didn't personally verify in code, and never a confidence of over 90% that I didn't personally test that the code worked like it looked ... that is experience talking. Confidences of 53% to 72% probably means talented and smart / non-blowheart types told me this information. *Cough* ... for the realists ... I've heard of two production ADs in excess of 50 M (less than 100 M though), and have seen 46, 85 and 100 M object test DITs. I've never seen an AD database in excess of 100 GBs in size. Basically, I'm neither worried about the # of objects nor the database size of AD databases, as clearly people haven't even gotten to an order of magnitude of the theoretical limits, and we've still tested higher than production deployments I've heard of / seen. 3 - 5 M is common for e-commerce directories. While thoretically we could give ~2/7ths of the world an account in a single AD database, that is not practical, limitations on backup/restore time, SLAs, amount of query load per server, will likely cause one to scale out and _probably_ partition (via NCs replicated to only some ADAM instances) before going to billion area scales. Management of database size on these scales is non-trivial, and drives the real per server #'s of objects / database sizes one should support down below 1 billion. Even e-commece doesn't care about these kind of numbers, because if you look at the income of the 1 billionth richest person in the world, you'll probably realize she/he is not worth selling to. Only hippies and the U.N. care about going above 1 billion accounts. [1] which you can't, as there are only IIRC ~1.0e+83 [or 84 or 82?] particles in the universe anyway. Sorry, if this mail used too much lingo, it was aimed at the super experts (Dean, joe, et al), I'll try to digest it into a series of more edible blog posts that would explain the terms as introduced ... :P Anyway, all I'm saying, is the Garage Door Operator has never heard of this 2.1 or 4.2 billion row limit of an ESE database you speak of ... Cheers, Brett P.S. - I've never heard of negative link IDs, I'm most curious to see Eric's description of this ... On Sat, 15 Apr 2006, Eric Fleischman wrote: > Good thread. > > > > A few corrections, for the sake of keeping the search engines fresh.... > > > >> The underlying store used by AD supports a theoretical maximum of 4.2 >> billion rows (limited by the 32 bit DNT or distinguished name tag) > > > > Actually, you can only have 2^31 DNTs. This is because we start at 1, > but it is actually a signed int. So we only get up to ~2bil or so, and > don't use the negative side. Sorry, you can't have the bit back, > unless you ask REALLY nicely. <g> > > > >> A row could be said to correlate to an object but it's certainly not >> a one-to-one relationship since rows also house many other structures >> such as tables, long-values, etc > > > > Ah, no, not quite (thankfully :-)). > > There is a similar limit for # of long values (doesn't work the same, > but mechanics omitted for the sake of brevity), but it has nothing to > do with row count in the data table. Long values are burst out to > their own b-tree, and as such would not be related to the DNT count > max that you were talking about before. In fact, the LID concept is > entirely orthogonal to the max row count governed by DNTs that was > being discussed. > > Dean and I also IM'd on this thread some, and the concept of link > value also came up. Rest assured, link values also do not consume > DNTs, they are stored entirely differently. > > > > But, I do agree with the general feeling here, though for a slightly > different reason. :) A row being used on a DC does not necessarily > correlate with only what people think of as "their objects hosted by > that particular server." You have phantoms, structural phantoms, > schema definitions, etc. Further, GCs of course drive the limitation > in large forests, when the # of objects that is large are in domain > NCs, of course (more on this below). > > > >> So ... to my knowledge, there's no user-related maximum other than >> the ESE constraints outlined above. Hundreds of millions of users >> seems perfectly practical. I personally have no first-hand >> experience of a directory of that scale but if memory serves I >> believe public documentation does exist referencing either (or both) >> test or production directories well within this arena. > > > > There is actually a subtle point here....there is max # of users in a > single directory instance (ie, on one given DC/ADAM instance), and max > # in the entire distributed system. They are somewhat different. > > In the ADAM world (read: no GCs), it is entirely possible to have a > series of instances, each of which house different NCs, and each NC > approaches the limits mentioned in this thread (ie, each has 2bil > objects say). So long as no one instances breaks the thresholds, you > are golden. > > It is only AD that can't play this game because GCs of course have > partial NCs. But ADAM, no worries. Well, unless your large # of > objects in AD are in NDNCs. > > > > The larger directories I have worked with had ~100M objects on a > single server. I haven't seen people break that on a single box....but > I don't deny it has been done, I just haven't seen it. :-) > > > > Oh yea, the concept of negative linkIDs somehow came up in > conversation as well. I'll blog about that I think. Perhaps even > tonight, if I get my stuff done. > > > > ~Eric > > > > > > > > ________________________________ > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of joe > Sent: Saturday, April 15, 2006 11:15 AM > To: [email protected] > Subject: RE: [ActiveDir] User Accounts > > > > Actually I am going to bust myself here before Dean or someone else > does. The SIDS are going to be limited into the billions. Not due to > the SID structure, but due to locations where RIDs are stored as > DWORDs (32 > bits) instead of as 6 bytes (48 bits). ADAM thoughts still stand as > they use the GUID logic for producing the SIDs, they are not based on > a domain SID coupled with an artificially limited 32 bit "RID". > > > > -- > > O'Reilly Active Directory Third Edition - > http://www.joeware.net/win/ad3e.htm > > > > > > > > ________________________________ > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of joe > Sent: Saturday, April 15, 2006 11:49 AM > To: [email protected] > Subject: RE: [ActiveDir] User Accounts > > I agree with Dean on this. :o) > > > > The only user logical or implementation related limitation I could > think of off the top of my head would be around SIDs and you are > talking a number in the trillions for Active Directory and much much > errr much higher for ADAM since they changed how SIDs are generated[1]. > > > > For completeness though not directly related to Christine's question I > also wanted to add that the other physical limit is simply one of size > which is ~16TB. This is governed by the max pages of ESE > (2147483646[2]) coupled with the page size used for the Active > Directory DB which is 8KB. That works out to 8*1024*2147483646 / > 1099511627776[3] or 15.9999TB. > > > > > > > > > > > > joe > > > > > > > > [1] See discussion in book mentioned in signature[7] > > > > [2] This max page size is publicly available in the ESE docs. It is > located on the page > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ese/e > se > /jetcreatedatabase2.asp?frame=true > <http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ese/ > es e/jetcreatedatabase2.asp?frame=true> however note there is a doco > bug where it says that is 2^32 - 2 and it obviously isn't... It is > 2^31 - 2[4]. Why not 2^32 - 2 which effectively doubles the size of > the DB for those who find ~16TB a trifle claustrophobic? You would > have to ask our Garage Door guy but I __know__ that the page vars are > specified as 32 bit "longs" and I would __theorize__ it is to avoid > hitting bit issues and make it is easier (and faster) for comparisons > and calculations so you don't have to watch out for overflows, etc. > This isn't something you tend to think about in scripting and > languages like VB and .NET but I can assure you, something below your > code has to handle it and it is extra work. So not using the high bit > gets you a nice one bit buffer[5] which sounds like very little but is > a lot of buffer for the calculations that would need to be made. > > > > [3] This is the number of bytes in a TB. 1024^4. If you had that much > in pennies you would be a billionaire. But still not as rich as billg. > > > > [4] I have submitted this feedback to MSDN for a second time. Usually > they are a little better about that when you submit something. :) Oh > how do I know which number is the correct one? I cheated and looked at > the source. ;o) > > > > [5] Not like a storage buffer but a programming buffer sort of like > putting tape up when painting so you don't have to go and do extra > work of scraping (or repainting another colour) later. > > > > [6] Why are you reading this footnote, I didn't reference it. :) > > > > -- > > [7]O'Reilly Active Directory Third Edition - > http://www.joeware.net/win/ad3e.htm > <http://www.joeware.net/win/ad3e.htm> > > > > > > > > ________________________________ > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Dean Wells > Sent: Saturday, April 15, 2006 9:48 AM > To: Send - AD mailing list > Subject: RE: [ActiveDir] User Accounts > > That number isn't accurate I'm afraid. The underlying store used by > AD supports a theoretical maximum of 4.2 billion rows (limited by the > 32 bit DNT or distinguished name tag) within its lifetime, deleted > objects (garbage collected or otherwise) do not return row numbers to > the available pool. A row could be said to correlate to an object but > it's certainly not a one-to-one relationship since rows also house > many other structures such as tables, long-values, etc. Note that the > limitation also differs from DC to DC since long-standing DCs will > have less row space available than those recently promoted. Windows > 2003 does not address this limitation (although improvements have been > made in other areas). > > > > So ... to my knowledge, there's no user-related maximum other than the > ESE constraints outlined above. Hundreds of millions of users seems > perfectly practical. I personally have no first-hand experience of a > directory of that scale but if memory serves I believe public > documentation does exist referencing either (or both) test or > production directories well within this arena. > > > > -- > Dean Wells > MSEtechnology > * Email: [EMAIL PROTECTED] > http://msetechnology.com <http://msetechnology.com/> > > > > > > > ________________________________ > > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Medeiros, Jose > Sent: Friday, April 14, 2006 10:39 PM > To: [email protected] > Subject: RE: [ActiveDir] User Accounts > > I was told 5 billion objects ( In Theory ) when I took the Windows > Server 2000, " Designing a Microsoft Windows 2000 Networking Services > Infrastructure ", taught by Cathy Moya at Quickstart Technologies ( > Now with Microsoft ). > > > > Joe, has Microsoft changed this in AD 2003? > > > > Jose > > > > > ________________________________ > > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Christine Allen > Sent: Friday, April 14, 2006 7:51 AM > To: [email protected] > Subject: [ActiveDir] User Accounts > > > > > > Hello, > > How many user accounts can Active Directory 2000/2003 support > (including email)? > > -Christine > > Christine N. Allen > Systems Engineer > BMC HealthNet Plan > 2 Copley Place > Boston, MA 02116 > 617-748-6034 > 617-293-4407 > > [EMAIL PROTECTED] > > List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/ List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/ List info : http://www.activedir.org/List.aspx List FAQ : http://www.activedir.org/ListFAQ.aspx List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
