On Sat, May 8, 2010 at 12:36 PM, Kiran Ayyagari <[email protected]>wrote:
> On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <[email protected]> > wrote: > > On 5/8/10 9:43 AM, Alex Karasulu wrote: > >> > >> Hi all, > >> > >> Any thoughts about using the globally visible UUID in the XDBM partition > >> design for the primary key for Entries instead of using a partition > >> specific > >> Long ID? > >> > >> I'm thinking we need one day to implement certain features. Let me list > >> then > >> and also point out why using the globally unique UUID might be > >> advantageous: > >> > >> (1) System wide DN and Entry Cache > >> > >> Rather than having each partition manage it's own cache a central > DN > >> and Entry cache makes sense. In this case a global identifier for an > entry > >> might come in handy for hashing cached values. > >> > >> (2) Nested Partitions, Default Root Partition, Hash Partitioning and > Range > >> Partitioning > >> > >> At some point we will want to have nestable partitions. This means > >> we > >> can have one ADS Partition mounted under another ADS Partition with > >> operation routing taking place properly to the nested partition where > >> appropriate. > >> > >> Nested partitions will also allow us to also have a default root > >> partition from which we can mount other partitions. The default root > >> partition is nice to have since it allows us to add administrative areas > >> and > >> their administrative points with subentries onto the root empty string > DN. > >> It also makes it so the RootDSE is now stored in this partition > properly > >> with persistence. Right now the RootDSE is generated and not mutable. > >> > >> Hash partitioning and range partitioning entails distributing > >> entries > >> across partitions under some container entry based on some value. Hash > >> partitioning uses the value's hash to distribute entries where as range > >> partitioning uses ranges of values to distribute the entries. So it's > not > >> really the DN that determines which partition the entry is pushed into > but > >> this hash or range value. This makes it so we can scale to very large > >> numbers of entries in the DIT while also distributing the disk access > load > >> across several disk spindles as does Oracle's RDBMS in these kinds of > >> configurations. > >> > >> (3) Global Indices > >> > >> If we use a globally unique UUID instead of a partition specific > >> Long > >> ID then we can expose index segments managed by partitions to higher > >> layers > >> to construct global indices. These global indices can then be used to > >> conduct searches outside of the partition one step higher. This makes > it > >> possible for us to implement certain virtual directory strategies > >> irregardless of the partition implementations used in a server's > >> configuration. The XDBM search algorithm can leverage these global > >> indices > >> or delegate sub partition search to a partition if a partition uses it's > >> own > >> search mechanism. There's a lot to be said here but this is neither the > >> time or the place to expand on this topic. But global indices is a key > >> factor for several things including virtualization. > >> > >> Thoughts? > >> > > > > One other advantage will be that we won't need anymore to store an > increment > > on the disk. Atm, each time we add an element in the backend, we have to > ask > > for a Long, which has to be unique. This is potentially a bottleneck, and > > it's costly, as this unique Long has to be stored on disk. > besides this I see some more advantages > > *if* we keep the entryUUID of entry also as the ID of the entry then, > building the DN using the RDN index will be > a lot easier (cause finding the parent of an entry requires now a full > DN construction which can be avoided > by doing a reverse lookup in RDN idex if we know the entry's ID) > > > > > I don't yet see any other negative impact we can get by using UUID > instead > > of Long, except that it will requires more disk space (slightly). > yeap, and RDN index also takes more disk space now > > Yeah but this disk space is very negligible. Basically the UUID is 16 bytes and the Long is 8 on intel arch. We're talking about 8 extra bytes here. So no need to even worry about it. The benefits will outweigh the disadvantages if this is all we can see for disadvantages. Regards, -- Alex Karasulu My Blog :: http://www.jroller.com/akarasulu/ Apache Directory Server :: http://directory.apache.org Apache MINA :: http://mina.apache.org To set up a meeting with me: http://tungle.me/AlexKarasulu
