Re: LDAP search strategy to retrieve a large number of entries

David J Kernen Wed, 05 Sep 2001 06:07:32 -0700
Nigel de Costa wrote:

> I have a large number of entries (500) that I would like to read from an
> iPlanet Directory Server. I know the DNs for the entries and I am using the
> Netscape LDAP SDK for Java.
>

500 is not a large number. I regularly do queries that return 50,000 results.
(But I always limit the attributes returned to just those in which I am
interested--see below.)

>
> Is it quicker to query by:
>
> a) creating a large search filter with 500 equality expressions OR b) use the
> LDAPConnection.read() method to read each entry using the DN

The key here is to make sure your indexing strategy is in sync with your most
commonly used search
filters. For example, if you regularly search for entries whose foo equals bar
then you should be indexing the foo attribute. If you are using a more complex
filter then you should also be indexing every attribute used in the filter. You
should also being paying attention to the kinds of searches you do against an
attribute. For example, if you usually just look for people who have a "foo"
attribute, then an "exists" (or "pres") index for that attribute is sufficient.
If you are looking for foo's of a specific value, then you need an "equality"
index. And if you look for attributes containing a specific mask (i.e. telephone
numbers starting with a specific area code) then you need a substring index. You
may need more than on type of index for an attribute. (In fact you often will.)

Of course if you know the DN's you can always search for each DN (by setting the
search base to the DN and the filter to objectclass=*, which is probably what
the JSDK's LDAPConnection.read() method does).
DN's are always indexed (that's how the directory server works). This is
probably why your DN-by-DN searching performs better; you're using an indexed
search rather than an unindexed one. However, if your large filter were on
indexed attributes you may find a slight performance improvement, because of the
decreased overhead of multiple requests to the servers.

>
> (Empirically I have found that the SDK or the Dir Server does not like large
> search filters and so it is better to break the filter into smaller
> batches).
>

This is not necessarily true, so long as your indexing is set up as above and
your directory server has enough memory to hold your directory in RAM and as
long as your client-side machine is able to deal with the chunk of data that
gets returned by the larger search. You can minimize the impact of large
searches on both your client machine and the network by limiting your search
results to just those attributes in which you are interested. The larger the
search the more important this is. The difference in both the time it takes to
obtain your results and the amount of memory it takes to hold them can be
dramatic.

In most SDK's, you have to pay for each result twice. The API will query the
server and get a response back. That response contains the search results, plus
any messages the server may decide to send to the client, in a specific and
quite funky format. Most SDK's will then rummage through the response and create
objects for each search result and sometimes for each message. This means that
your client program will contain the data at least twice, once in its raw form
and once in its objective form. And that's before you even get your hands on it!
Your application may also be saving bits and pieces of the results elsewhere. So
streamlining your query results can have a dramatic affect on memory usage and
thus performance.


>
> Nigel de Costa

Hope this helps...
Dave Kernen
Re: LDAP search strategy to retrieve a large number of entries

Reply via email to