Excellent comments and I will reply to this soon but please note that NavigableMap is in jdk 6 :(.
Alex On Jan 30, 2008 8:00 AM, Emmanuel Lecharny <[EMAIL PROTECTED]> wrote: > Hi Alex, > > I will comment your mail extensively, with some ideas I have. > > Alex Karasulu wrote: > > Background: > > > > ApacheDS used JDBM a B+Tree implementation in it's default partition > > implementation. JDBM does not allow duplicate keys (same key with many > > values) and so we abstract this using the Table interface. Tables > > support duplicate keys. So to allow this JdbmTable which implements > > this interface uses a TreeSet or something called a BTreeRedirect to > > manage the values of duplicate keys. When some threshold of values is > > reached like say 50 for the same key, the JDBM table switches from > > using a TreeSet to using a BTreeRedirect. A BTreeRedirect is a pointer > > to another BTree in the same db file. The BTree pointed to contains > > the values of the duplicate key as it's keys. > > > > Problem: > > > > I was looking into the new Cursor interface and came to the > > realization that we can no longer use TreeSets efficiently or properly > > for that matter. To build a Cursor on this we would have to be able > > to traverse up and down as the user of the Cursor desired and there's > > simply no way to do this. > Considering that a treeset internally contains a NavigableMap, traverse > a TreeSet is possible. The problem is that you can't move backward as > soon as you have moved forward, without fetching a new reference of the > TreeSet. It's basically an enumaration. The question is : do we need to > move back and forth ? > > > > Proposal: > > > > Implement an in memory linked splay tree with splay Cursor using the > > links for traversal. Splay trees are idea for recurring lookups which > > would be the case for us. Furthermore splay trees are great for > > caches. Splay trees reposition the most frequently accessed nodes > > close to the root of the tree making access times faster. They > > perform poorly though when inserting in an ordered fashion. I think > > this is acceptable for our use of this data structure for a TreeSet > > (which uses a red-black tree) and ideal for use in devising a better > > entry cache higher up in the server. > > > > Thoughts? > There are many different things to consider. First, let's divide, then > conquer... > > 1) We have many different kind of data we want to manipulate. Let's list > them : > o Entries. They are stored into the MasterTable. An entry is a > serialized Attribute, associated with a unique identifier, which is > currently a Long (which allows our server to store up to 18 446 744 073 > 709 551 615 entries ... 18 quintillions !). Those long must be fetched > quickly, as we will always get an entry by its ID. Using a BTree for > that is time consuming, as fetching an entry will be done in O(log(N)). > We should use a Hash to speedup entries retrieval (of course, at the > risk that the worst case scenario can transform a search operation to > cost N read, but this is unlikely...). The only point to consider is > that this ID is incremental, so the hash function must be chosen > carefully. > > o DNs. This is a very specific case. DNs are unique, mono valued, > unordered elements. DNs are not attributes, too. It's unordered, as a DN > order depends on the RDN attribute's own order, which can be different > depending on the RDN's attribute. The consequence is that we should use > a Hash to store the DNs. > > o Attributes with single values. Typically, 'uid'. There are supposed to > be unique, too (ie, you can imagine that an uid can be found more than > once in a LDAP server (there is no reason why you should not be allowed > to store a user's UID in many places, if this user is registred in more > than once... For instance, I use the same uid on several computers, so I > may have to store these UID in different branches in my LDAP server). We > can use a Hash table to store such an attribute, because it's supposed > to be unique. If you have more than one reference, then the question > will be : how do I manage more than one value associated with this > attribute (cf next point). > Using a Hash table here means you know that this attribute is not only > holding a single value, but also it's unique within the whole server. > Dangerous ... > > o All other attributes (including single valued attributes which can be > found more than once) : HashMap is not really a good choice, except if > the admin *knows* that the number of values won't be enormous. And we > have specific cases where you have a Attribute <-> Value relation where > an attribute may have thousands (even millions !) of values. The > 'member' attribute is a good example. But we also have to consider the > ATtribute <-> entries relation. A search request like > (ObjectClass='person') will use the ObjectClass index, and get all the > entries containing the 'person' value for tje ObjectClass attribute. > Potentially, thousands... The current implementation, as explained by > Alex, instead of storing a list of all the entry IDs linked to the > 'person' value, stores a TreeSet, which is managed in a separate file. > There is an indirection, a little bit like when you deal with a N-N > relation in a RDBMS context (when two tables are related with a N-N > relation, then you create a third table holding the relation between > both IDs). > Can we do better ? Because this solution is costly, and not really > effective : you have to deal with the structure duality (you hold either > a list or a reference to a treeset, depending on the number of > elements), and this make it more complex to manage cache (i'm not sure > at this point that we can cache such elements...) > Alex proposal (using Splay trees) won't solve the problem, IMHO. It's > just a better way to store data in memory, with extended navigation > possibilities, but it does not help when it comes to store this > structure on the disk. As the data are moved frequently, even on read, > this will increase the cost of searching so much that it will kill the > main advantage of LDAP vs RDBMS. So, yes, we can use Splay trees in > memory, but we can't store splay trees on disk. > > Some other points to take into consideration is also the fact that we > may run short of memory, but we don't want the server to stop running, > just because we got an OOM exception. One way to handle this would be to > use weak references for the cache.I have no idea what it may imply for a > Splay tree implementation, but this must be studied. > > Now, for those attributes, I have described another solution, based on > the fact that we use 'long' to store IDs : > http://cwiki.apache.org/confluence/display/DIRxSRVx11/Backend. I think > this might be interesting to dig those ideas a little bit to see if it > fits our need. > > > Note : Alex's need is more urgent, so I guess that we won't be able to > implement something very different than what we currently have. I just > wanted to present some of the ideas i'm playing with for years now... > (but I never had enough time to go any further than drafting some small > pages on confluence ;) > > > -- > -- > cordialement, regards, > Emmanuel Lécharny > www.iktek.com > directory.apache.org > > >
