Re: Implementing the PagedSearchControl

Emmanuel Lecharny Fri, 05 Dec 2008 08:12:45 -0800

Alex Karasulu wrote:

Hi Emmanuel,


On Wed, Dec 3, 2008 at 6:25 PM, Emmanuel Lecharny <[EMAIL PROTECTED]>wrote:

The problem I have is the following : we have to remember the pointer to
the last entry we have sent back to the client

How should we do ? My first approach was pretty naive : we are using a
cursor, so it's easy, we simply store the cursor into the session, and the
next request will just have to get back this cursor from the session, and
get the N next elements from this cursor.

This has the advantage of being simple, but there are some very important
cons :
- it's memory consuming, as we may keep those cursor in the session for a
very long time
- we will have to close all the cursors when the session is closed (for
whatever reason)
- if some data has been modified since the cursor creation, it may contain
invalid data
- if the user don't send and abandon search request, those cursors will
remain in the session until it's closed (this is very likely to happen)

So I'm considering an alternative - though more expensive and less
performant - approach :
- we build a new cursor for each request,
- we move forward the Nth entry in the newly created cursor, and return
back the M requested elements
- and when done, we discard the cursor.


I would avoid this approach.  The problem is that it requires almost a
factorial amount of computation as you scan back to the point you were at
before to advance the cursor.  Say you have 100 entries and you advance
reading the first 10.  Then create a new cursor and ask for the next 11-20
elements.  This means you'll scan through the first 10 elements checking if
each element is a match for the filter and as you know this shifts a nested
structure of cursors structured to reflect the logic of the filter.  So
you're doing a search for 10, then 20, 30, 40, 50, 60 and so on elements.

Yes, I'm aware of that. And I will certainly not go this way ...

The pros are
- we don't have to keep n cursors in memory for ever.



The whole point to this feature is to maintain state so the search continues
where it left off.  But this should be cheap both for the server and for the
client. This approach is a brute force approach and it's going to mix up a
lot of code in complicated places.

It's OK to hold off on this until we see a better approach.  I'd rather wait
until we feel that eureka light bulb go off.

- from the client POV, it respects the PagedSeach contract
- it's easier to implement as we have less information to keep in the
session, and to restore back

The cons are :
- it's time consuming, as if we have N entry to return, with a P page size,
we will construct N/P cursors.


Yes and there will be costs to advances.  Both are going to make this
approach limiting.

I'm currently going a bit forward into the other direction (ie, storingthe cursor in the session).

There are vicious issues, though. Some of them are related to the way wehave designed the server. For instance, when comparing the previoussearchRequest with the current one, you have to compare attributes, DNand filters. That's not complicated, except that those elements mightnot be equals, just because they have not yet been normalized at thispoint (in SearchHandler).

This is a big issue. At this point, we can manage to normalize the DNand attributes, but for the filter, this is another story. This make methink that the Normalize interceptor is not necessary, and that itshould be moved up in the stack (in the codec, in fact).

Otherwise, the other problem we have is the Cursor closure. When we aredone with them, we should close those guys. This is easy if the clientbehave correctly (ie, send a last request with 0 as the number ofelement to return, or if we reach the end of the entries to return), butio the client don't do that, we will end with potentially thousands ofopen cursors in memory.

So we need to add a cleanup thread associated with each session, closingthe cursor after a timeout has occured.


Those are the two problems I'm currently facing...

Otherwise, the implementation itself is pretty straightforward (well,not that much, but it's just simple code).


Any idea about how to handle those two problems ?

Alex



--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: Implementing the PagedSearchControl

Reply via email to