Using cursors for walking entry lists (and saving the cursor state) is certainly useful inside the server, but there's nothing you can safely gain from the client side.

It kind of sounds like you're talking about Virtual List Views, not paged results. Remember that search responses in LDAP/X.500 are unordered by definition. Therefore it makes no sense for a standards-compliant client to send an initial request with a Paging control saying "start at responses 200-300" because the order in which entries will be returned is not defined. You need something like VLV which requires SSS to even begin thinking about this; it's not a job for Paged Results.

Even if you have a stable ordering (which SSS is actually unable to guarantee) you still can't reliably identify response #200 since underlying entries may be added or deleted while the search is progressed.

Emmanuel Lecharny wrote:
Using cursors into ADS will also allow us to implement the Paging
control (RFC 2696) so easily ! Even defining a new control (and a new
RFC) as we will be able to go back and forth, which is not possible
with the Paging control.

The Partition interface in ApacheDS will soon expose search results by
returning a Cursor<ServerEntry>  instead of a NamingEnumeration.

It will be Cursor<Entry>  (as this is the top level interface)

Depending
on the filter used, this is a composite Cursor which leverages partition
indices to position itself without having to buffer results.  This allows
the server to pull entries satisfying the search scope and filter one at a
time, and return it to the client without massive latency, or memory
consumption.  It also means we can process many more concurrent requests as
well as process single requests faster.  In addition a resultant search
Cursor can be advanced or position using the methods described above just by
having nested Cursors based on indicies advance or jump to the appropriate
Index positions.  We already have some of these footprint benefits with
NamingEnumerations, however the positioning and advancing capabilities are
not present with NamingEnumerations.

Further experiments and researches will help a lot here. We may have
problems too, as this will be a concurrent part : some of the data may
be modified while the cursor is being read.

During the course of this work, I questioned whether or not client side
Cursors would be possible.  Perhaps not under the current protocol without
some controls or extended operations.  Technical barriers in the protocol
aside, I started to dream about how this neat capability could impact
clients.  With it, clients can advance or position themselves over the
results of a search as they like.  Clients may even be able to freeze a
search in progress, by having the server tuck away the server side Cursor's
state, to be revisited later.

The major improvement with Client cursors is that the client won't
have anymore to manage a cache of data. Thinking about the Studio, if
you browse a big tree with thousands of entries, when you want to get
the entries from [200-300] - assuming you show entries by 100 blocks -
you have to send another search request _or_ you have to cache all the
search results in memory. What a waste of time or a waste of memory !
If we provide such a mechanism, the client won't have to bother with
such complexity. Data will be brought to the client pieces by pieces :
if the client want numbe 400 to 500, no need to get the 499 first
entries. If the client already pumped out the first 100 entries, it's
just a simple request on the same cursor, no need to compute it again.

So, yes, client cursors make sense too.

For lack of terms I've likened this to a form
of asynchronous bidirectional LDAP search. This would eliminate the need to
bother with paging controls.  It could even be used to eliminate the thread
per search problem associated with persistent search.  OK, let me stop
dreaming and start looking at reality so we can determine if this is even a
possibility.

Reality is just a dream became true :) (sometime, it's a nightmare :)

So these characteristics of a Cursor have a profound impact on the semantics
of a search operation - not talking about the protocol yet.  I'm referring
to search as seen from the perspective of client callers using the Cursor:
the front end.  As stated search operations can be initiated and shelved to
persist the state of the search by tucking away the Cursor in the connection
session.  A Cursor for a search will automatically track it's position.

However the protocol imposes some limitations on being able to leverage
these capabilities across the network on an LDAP client.  A search request
begins the search, and entry responses are received from the server, until
the server returns a search response done operation which  signals the end
of the search operation.  During this sequence, without creative extended
operations, or controls, there's little the client can do to influence the
entries returned by the server or throttle the rate of return.  Of course
size and time limits can be set on the search request but after issuing the
search, these cannot be altered.  Because the LdapMessage envelop contains a
messageId, and all responses contain the messageId of the request they
correspond to, the protocol allows for multiple requests to be issued in
parallel.  Even if client API's do not allow for it, this is certainly
possible.

The main point is that each client is associated with a session. It's
then easy to handle a context and use it to store meta data (like a
previously created cursor on some search request, cursor which can be
reused if the underlying data have not been modified).

That bring another matter on the table : if we want to reuse cursors,
we _must_ implement a decent entry cache.
Although I've long forgotten how the paging control works exactly, I still
have a rough idea: forgive me for my laziness and if I'm missing something.
A control specifies some number of results to return per page, and the
server complies by limiting the search to that number then capping off the
search operation with a search result done.  Cookies in the request and
response controls are used to track the progress, so another search request
for the next page returns the next page rather than initiating the search
from the start.  This breaks a big search up into many smaller search
requests.

This is true from the client perspective. On the server, there should
be only one search, and the returned results are just waiting for
another search with the same cookie.

This way the client has the ability to intervene in what
otherwise would be a long flood of results in a single large search
operation.  If this page control could also convey information about
positioning, and directionality, along with a page size set to 1, we could
implement client side Cursors with the same capabilities they posses on the
server.

Exactly ! For instance, using negative size would result if going
backward. This is a very minor extension to the paged search RFC, and
it can even be implemented using the very same control, simply adding
some semantic to it.

Another extension would be to add a 'position' to start with.

Paging search results effectively has the server tucking away the
search Cursor state into the client session and pulling it out again to
continue.  This is how we would implement this control today (that is if
anyone gets the time to do so :) ).

BTW change notifications are probably best implemented as a combination of
search and extended operations through unsolicited notifications.   The
client issues a search request with a control similar to the persistent
search request control.  Instead of 'persisting' the search, the search
returns immediately with a search result done response using a result code
to indicate whether or not the server will honor the request to be notified
of changes.

This is a big semantic shift... Not sure that it will fit with the
current LDAP protocol. However, LDAP V4 does not exist yet ;)

There's no reason this approach can't be used in LDAPv3. Just that no existing LDAPv3 clients or servers have support for such a control at the moment.
--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP     http://www.openldap.org/project/

Reply via email to