Alex Karasulu wrote:
Hi all,
Hi Alex,
The ServerEntry stores the DN of the entry.  I think this is good for better
code organization.  However, storing the entry together with it's DN into
the master table is a very bad idea.  The DN should instead be managed in
the NDN and DN indices.
I think you are wrong. Storing the DN within the entyr is a very good idea (tm) :)

And the DN should also be managed in the NDN and DN indice.
The reason why I'm suggesting this is because modifyDN operations will be
extremely cumbersome when performed on a DN with many children.
ModifyDN operation will be slow. So what ? How many ModifyDN will occurs compare to Search operations ? Storing DN within entires was specifically done in order to speedup the search operation, as it allows us to return a result in 2 accesses to the backend : - an indice access (and it can be the DN indice, this is why we need it), or any other attribute
- and an access to the master table

When we didn't had this DN stored within th entry, we had one more access, to the DN index, just because we had to get the DN to return back an entry.

This was a costly operation, because we had to do log(N) comparison of DN (N = numbers of entry).

So, yes, ModifyDN has been slowed down big time, for the benefit of all the searches, something I personally want to pay the price.
  It will
require each child and the target entry to be retreived and written to disk
to-from the master just to change it's DN.  Plus we still have the updn and
ndn indices which also get updated so this is wasteful and causes a lot of

unnecessary access operations. Also note that we can store a lot more DNs
in a cached JDBM page then we can entries.  So this will produce more memory
consumption along with cache turn over.
The memory waste is something we can manage. We are storing two kind of data :
- trees
- DN and Entries

If we have to favor one of those two guys, it would be the trees. We can cache some data, but at some point, with millions of entries, you won't be able to store more than a few of them in memory anyway. Having the DN cached of not is just a small part of the problem. (We can consider that for a 1k entry - a small one -, the DN is less than 10% of its size)

I don't think we should overlook the extra memory it takes to store the DN within the entry.

Anyway, if we don't do that, you immediatly realize that you have to do another lookup on the DN index to get the DN for an entry you want to return back to the client, an operation which may need disk access, many comparison, etc...


If the modifyDN operation changes the RDN of the target, a master table
access is unavoidable because the target's RDN attribute in the entry must
change. However the children of the target can avoid a master table
read-write operation since their RDN attributes do not change.  This is
again only avoidable if we do not store the DN in the master.  Ideally you
just want to update the indices when entries are moved around.
Granted. By I don't think we should optimize the server in order to get the ModifyDN operation be the fastest possible. I don't think you will have more than a few ModifyDN operations (with child being moved) per year on a serious LDAP server instance.
I've been against this drive to push the DN into the master table combinded
with the entry from day one along with the drive to remove the NDN and UPDN
indices. The obvious reason is due to these issues.
You are just fixing bug on Modify operation right now, and being focused on it, you are losing the whole picture, I think. Step back, let's discuss the pros and cons with a global vision, and may be you willl realize that it was a good move.
 I just did not have
the time to clarify exactly why until I started looking into this bug which
was recently introduced:


 *DIRSERVER-1224 <https://issues.apache.org/jira/browse/DIRSERVER-1224>*
As I reviewed the code it was clear what this will cost much more on all the
flavors of ModifyDN operations.  Just imagine a ModifyDN to rename ou=People
to ou=Users if it contains 100M users in it.  I'd recommend we agree to fix
this as recommended then I can push a JIRA on it so this can be fixed in the
future (but before 2.0 since the correction will cause db
incompatibilities).
Again, this is shortsighted. You are focusing on extreme cases :
- server with 100M entries
- and applying a ModifyDN (a rare operation) which moves 100M entries

Anywya, you have a point here : ModifyDN will be awfully slow, just because we will have to rewrite the entire master table. If you think about what would have happened with the previous implementation, then you would just have to rewrite the entire DN table. I would say it's simply 2 or 3 times faster. Big deal ! We are not talking of order of magnitude here.

To sum up the advantages of having the DN within the entry, I would say that avoiding a lookup for the DN for a search will save a lot of time (not order of magnitude), say, 20%, for _every_ search operation. I don't think that people search using the DN often, compared to searches using another attribute to get the entry.

Let's start a discussion if needed. We can ever add a switch on the server to tell the server to store the DN within the entry or not, depending on which kind of operation will be the most frequent one : searches, or modifyDN with many children moved.

--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org


Reply via email to