On Mon, Nov 11, 2013 at 10:46 PM, Howard Chu <[email protected]> wrote: > Alex Karasulu wrote: > >> Now, this is an approach where we used plain Keys (ie, Keys can have >> various sizes, which is not really efficient, as we may have to >> allocate more pages than necessary to store nodes and leaves. >> Open LDAP uses another approach, which is smarter : they use the hash >> value of each key to retrieve the element. Obviously, this leads to >> compare the keys when we reach the leaf, as we may have more than one >> key with the same hash value, and it also destroys the ordering (one >> can't compare two hash values as the ordering will be different) but >> most of the case, it's not really a big deal. >> The main advantage of such an approach is that suddenly, Nodes have a >> fixed size (a hash can be stored as an int, and the references to a >> page are longs), so in a fixed page size, we can store a fixed number >> of elements. Assuming that a node needs at least 28 bytes to store its >> header and PageIO, in a 512 bytes page we can store (512 - 28) / >> ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes) >> and 17 values (272 bytes). We hve 148 bytes remaining in this case. >> Atm, we store 16 element per node, which requires many physical pages, >> ie, many disk access. >> >> This is something that worth being investigated in the near future. >> >> >> Sounds like we need a minimal perfect order preserving hash function. >> > > These are a pain to try to use for this purpose. All of the perfect order > preserving hashes I've found only work because you generate the hash > function based on knowing the entire set of data in advance. When you add > new records you need to create a new hash function, and thus the hash > values of every key changed. > > I.e., these are only useful for static data sets. >
That's lame ... no silver bullets I guess. -- Best Regards, -- Alex
