Re: [ApacheDS] Default partition design ideas

Emmanuel Lecharny Sun, 03 Feb 2008 13:49:41 -0800

Howard Chu wrote:

    A good hash function is one that evenly distributes the input keys
    across the
    entire hash table. This makes hashing extremely cache-unfriendly
    when doing
    sequential traversals of a database, or sequentially bulk-loading.
Very good point about these opposing factors. So I figure OpenLDAP just
uses the cache in the underlying B-Tree instead of managing some kind of
separate entry cache?
No, we have a separate entry cache too. I originally wrote back-bdbwithout any entry caching, but Jong @ IBM benchmarked/profiled it andimplemented an entry cache for it before 2.1 was released. (Back thenI was happy just to get it working, at speeds comparable to back-ldbm.How things change ;)
The motivation here is that the serialized data that we store in thedatabase is not directly usable with our in-memory data structures.

I have done some very quick-and-dirty micro benchmarks for DNserialization. This is not directly related to Entries, but the idea wasto tests different solutions for serializing data. I do a serializationfollowed by a deserialization. Here are the results :

Serialization/deserialization for 1 million DN = "ou= Some People +dc= And Some anImAls,dc = eXample,dc= cOm"

Serializing te entire DN structure (DN, RDNs, AttributeTypeAndValues) :28 507 msSerialiazing a DN as a string and parsing it back when deserializing :37 400 ms

With a transformation of this String into byte[] : 37 467 ms
Test the Creation of a DN and Parsing (no serialization): 13 697 ms
Normalization (no serialization): 15 643 ms

Note : those serializations are done in memory (no disk access at all)

Conclusion :

1) obviously, it's more costly to parse a DN than to rebuild the entirestructure.2) on my laptop, 1.7Ghz centrino, I can at best serialize anddeserialize around 35 000 DN per second.

The point (2) is very important : an entry iis way bigger than a DN, butalso less complex (no parsing involved). We can consider that dealingwith an Entry will cost more. This imply that having a pre-cache (anentry cache) will be much more efficient than having a single page cacheof serialized data. I would suggest that we should just cache Btreestructure leaves (those which only contains keys), and leave the entriesto the pre-cache. It might be the best solution (tips : to be tested andvalidatd !)


Note : I have done these performance tests before reading this mail.

As such, even when the B-Tree cache is 100% effective, there's a costassociated with deserialization that we can avoid by using our ownentry cache. I rewrote the entry cache for 2.2, and it's been gettingslight tweaks ever since. E.g., in 2.3 I added threaded AVL trees toour utility library so that we can do fast sequential traversals ofvarious caches.

I don't know why AVL trees are so under-utilized... They are probablyone of the most interesting structure when you want to balance the costof sorting data and the cost of minimizing the tree height... Maybebecause it's complex to implement compared to B-trees ?

As we're still in the early days of 64 bit computing, I'd be temptedto play with a single-level-store concept again, where disk addressesmap 1:1 to memory addresses. In that case, we could simply writeentries in their in-memory form straight to disk, and eliminate alevel of caching. This concept worked back when virtual address spaceswere 32 bits but data volumes tended to be on the order of 24 bits.When the overall data volume approaches the size of the virtualaddress space, this kind of scheme falls down.

This is a ath we might want to explore in java, as we have memory-mappedfiles. I don't know ...

I guess this is due to synchronization. A splay based cache then defeats
itself since it requires a splay operation (memory write) on everylookup.
Right.
There are plenty of interesting data structures/system architecturesto explore, and now you've got the lab resources to try them out.

OpenLdap design is just fine for such tests, as you can define aspecific backend for a partition, so you can test a lot of differentsolutions. We are working on this too, but it's not ready yet (but willbe soon, as Alex is on it)


Thanks Howard !


--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: [ApacheDS] Default partition design ideas

Reply via email to