Alex Karasulu wrote:
Hi Emmanuel,

On Wed, Dec 3, 2008 at 6:25 PM, Emmanuel Lecharny <[EMAIL PROTECTED]>wrote:

The problem I have is the following : we have to remember the pointer to
the last entry we have sent back to the client

How should we do ? My first approach was pretty naive : we are using a
cursor, so it's easy, we simply store the cursor into the session, and the
next request will just have to get back this cursor from the session, and
get the N next elements from this cursor.

This has the advantage of being simple, but there are some very important
cons :
- it's memory consuming, as we may keep those cursor in the session for a
very long time
- we will have to close all the cursors when the session is closed (for
whatever reason)
- if some data has been modified since the cursor creation, it may contain
invalid data
- if the user don't send and abandon search request, those cursors will
remain in the session until it's closed (this is very likely to happen)

So I'm considering an alternative - though more expensive and less
performant - approach :
- we build a new cursor for each request,
- we move forward the Nth entry in the newly created cursor, and return
back the M requested elements
- and when done, we discard the cursor.


I would avoid this approach.  The problem is that it requires almost a
factorial amount of computation as you scan back to the point you were at
before to advance the cursor.  Say you have 100 entries and you advance
reading the first 10.  Then create a new cursor and ask for the next 11-20
elements.  This means you'll scan through the first 10 elements checking if
each element is a match for the filter and as you know this shifts a nested
structure of cursors structured to reflect the logic of the filter.  So
you're doing a search for 10, then 20, 30, 40, 50, 60 and so on elements.
Yes, I'm aware of that. And I will certainly not go this way ...

The pros are
- we don't have to keep n cursors in memory for ever.


The whole point to this feature is to maintain state so the search continues
where it left off.  But this should be cheap both for the server and for the
client. This approach is a brute force approach and it's going to mix up a
lot of code in complicated places.

It's OK to hold off on this until we see a better approach.  I'd rather wait
until we feel that eureka light bulb go off.


- from the client POV, it respects the PagedSeach contract
- it's easier to implement as we have less information to keep in the
session, and to restore back

The cons are :
- it's time consuming, as if we have N entry to return, with a P page size,
we will construct N/P cursors.


Yes and there will be costs to advances.  Both are going to make this
approach limiting.
I'm currently going a bit forward into the other direction (ie, storing the cursor in the session).

There are vicious issues, though. Some of them are related to the way we have designed the server. For instance, when comparing the previous searchRequest with the current one, you have to compare attributes, DN and filters. That's not complicated, except that those elements might not be equals, just because they have not yet been normalized at this point (in SearchHandler).

This is a big issue. At this point, we can manage to normalize the DN and attributes, but for the filter, this is another story. This make me think that the Normalize interceptor is not necessary, and that it should be moved up in the stack (in the codec, in fact).

Otherwise, the other problem we have is the Cursor closure. When we are done with them, we should close those guys. This is easy if the client behave correctly (ie, send a last request with 0 as the number of element to return, or if we reach the end of the entries to return), but io the client don't do that, we will end with potentially thousands of open cursors in memory.

So we need to add a cleanup thread associated with each session, closing the cursor after a timeout has occured.

Those are the two problems I'm currently facing...

Otherwise, the implementation itself is pretty straightforward (well, not that much, but it's just simple code).

Any idea about how to handle those two problems ?
Alex



--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org


Reply via email to