Alex Karasulu wrote:
Hi all,
Hi Alex,
The ServerEntry stores the DN of the entry. I think this is good for better
code organization. However, storing the entry together with it's DN into
the master table is a very bad idea. The DN should instead be managed in
the NDN and DN indices.
I think you are wrong. Storing the DN within the entyr is a very good
idea (tm) :)
And the DN should also be managed in the NDN and DN indice.
The reason why I'm suggesting this is because modifyDN operations will be
extremely cumbersome when performed on a DN with many children.
ModifyDN operation will be slow. So what ? How many ModifyDN will occurs
compare to Search operations ? Storing DN within entires was
specifically done in order to speedup the search operation, as it allows
us to return a result in 2 accesses to the backend :
- an indice access (and it can be the DN indice, this is why we need
it), or any other attribute
- and an access to the master table
When we didn't had this DN stored within th entry, we had one more
access, to the DN index, just because we had to get the DN to return
back an entry.
This was a costly operation, because we had to do log(N) comparison of
DN (N = numbers of entry).
So, yes, ModifyDN has been slowed down big time, for the benefit of all
the searches, something I personally want to pay the price.
It will
require each child and the target entry to be retreived and written to disk
to-from the master just to change it's DN. Plus we still have the updn and
ndn indices which also get updated so this is wasteful and causes a lot of
unnecessary access operations. Also note that we can store a lot more DNs
in a cached JDBM page then we can entries. So this will produce more memory
consumption along with cache turn over.
The memory waste is something we can manage. We are storing two kind of
data :
- trees
- DN and Entries
If we have to favor one of those two guys, it would be the trees. We can
cache some data, but at some point, with millions of entries, you won't
be able to store more than a few of them in memory anyway. Having the DN
cached of not is just a small part of the problem. (We can consider that
for a 1k entry - a small one -, the DN is less than 10% of its size)
I don't think we should overlook the extra memory it takes to store the
DN within the entry.
Anyway, if we don't do that, you immediatly realize that you have to do
another lookup on the DN index to get the DN for an entry you want to
return back to the client, an operation which may need disk access, many
comparison, etc...
If the modifyDN operation changes the RDN of the target, a master table
access is unavoidable because the target's RDN attribute in the entry must
change. However the children of the target can avoid a master table
read-write operation since their RDN attributes do not change. This is
again only avoidable if we do not store the DN in the master. Ideally you
just want to update the indices when entries are moved around.
Granted. By I don't think we should optimize the server in order to get
the ModifyDN operation be the fastest possible. I don't think you will
have more than a few ModifyDN operations (with child being moved) per
year on a serious LDAP server instance.
I've been against this drive to push the DN into the master table combinded
with the entry from day one along with the drive to remove the NDN and UPDN
indices. The obvious reason is due to these issues.
You are just fixing bug on Modify operation right now, and being focused
on it, you are losing the whole picture, I think. Step back, let's
discuss the pros and cons with a global vision, and may be you willl
realize that it was a good move.
I just did not have
the time to clarify exactly why until I started looking into this bug which
was recently introduced:
*DIRSERVER-1224 <https://issues.apache.org/jira/browse/DIRSERVER-1224>*
As I reviewed the code it was clear what this will cost much more on all the
flavors of ModifyDN operations. Just imagine a ModifyDN to rename ou=People
to ou=Users if it contains 100M users in it. I'd recommend we agree to fix
this as recommended then I can push a JIRA on it so this can be fixed in the
future (but before 2.0 since the correction will cause db
incompatibilities).
Again, this is shortsighted. You are focusing on extreme cases :
- server with 100M entries
- and applying a ModifyDN (a rare operation) which moves 100M entries
Anywya, you have a point here : ModifyDN will be awfully slow, just
because we will have to rewrite the entire master table. If you think
about what would have happened with the previous implementation, then
you would just have to rewrite the entire DN table. I would say it's
simply 2 or 3 times faster. Big deal ! We are not talking of order of
magnitude here.
To sum up the advantages of having the DN within the entry, I would say
that avoiding a lookup for the DN for a search will save a lot of time
(not order of magnitude), say, 20%, for _every_ search operation. I
don't think that people search using the DN often, compared to searches
using another attribute to get the entry.
Let's start a discussion if needed. We can ever add a switch on the
server to tell the server to store the DN within the entry or not,
depending on which kind of operation will be the most frequent one :
searches, or modifyDN with many children moved.
--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org