When you get that 2 second result, have you declared an index on the fields you are sorting on or is this a full sort + count + offset query?

At least from what I can tell of the PostgreSQL implementation of SQL, the optimizer doesn't keep accurate counts for the underlying table so always has to do a table scan to compute counts. So what seems to be happening is a very efficient full index scan and then perhaps some way of combining the sort with an accumulating count with a terminate at offset to bypass doing the full merge (although I doubt it's that efficient:).

One thing we can do in elephant is to avoid deserializing the value part of key-value pairs and just operate on the keys (for index scans or linear count scans). This will help keep the lisp GC memory from thrashing too much.

Ian

On Oct 3, 2007, at 11:07 PM, [EMAIL PROTECTED] wrote:

I have to say that Mariano is hitting some of the issues we will be facing soon as our quest to learn Lisp and Elephant continues and we continue working on migrating some of our SQL-based applications over. This particular need of his is also a real need we have since it's something we offer to our application users. For example, in the application we are working on migrating, we have a table with over 7 million rows. This obviously has many thousands of 50-row pages to navigate through. Our user interface offers the "usual" search, sort by any column(s), and page navigation (First, Last, Next, Previous, or manually input a page number).

The way we handle this in our code is something like SELECT COUNT (*) FROM table_name. From the count, figure out the number of pages. Then compute an offset based on the page the user wants to view (e.g. assuming 50 rows per page and wanting to view page 90, the offset would be (50 * 90) - 1 = 4499) and formulate the SQL query as something like SELECT * FROM table_name ORDER BY {sort_order} OFFSET {computed_offset} LIMIT 50 (note that all this assuming a 50-row page size, and the user also has the ability to change the page size via the web interface)

From a SQL data manipulation language perspective, it's pretty straight forward. From a SQL internal execution path, I really have no idea how it's implemented and don't know if it does any linear scanning to return the results. The fact is that our application allows you to navigate through the 7+ million row table in under 2 seconds per page no matter which page you wish to view or sort order. From a user perspective, 2 seconds for a browser-based screen refresh is more than acceptable. Will Elephant allow to "refresh" as quickly if in the current model it needs to do a linear scan? We haven't gotten there yet, but maybe someone can comment on that.

Thanks

On Oct 3, 2007, at 9:13 PM, Ian S Eslick wrote:

When you say indexes are not sequential, do you mean UIDs are not sequentially allocated? I think there is a BDB sequence issue that I've never worried about that jumps to the nearest 100 when you reconnect. However, if you create anything other than a user object, you will also have gaps in the UID sequence so that's a fundamental issue. Don't assume anything about UIDs other than the fact that they are unique.

You could create and index your own field which is a sequential ID for creation ordering, but it sounds like you probably want to return a sublist based on some sort order like alphabetical by name or by date. In this case, at least doing the last page is easy, map from end and count the # of users you want before you terminate, but to find an element that is N elements away from the first or last element in less than O(n) time isn't possible with the underlying B-Trees we're using.

The first question is whether you database is guaranteed to be so big that you can't just do this linear time. When you start to face performance issues, then you can look at building that additional data structure.

Otherwise, you will have to implement a data structure that maintains this information on top of the Elephant infrastructure.

The first idea that occurs to me is to drop the idea of using an indexed class or standalone btrees and just build a red-black tree using object slots (you can inherit from a base class that implements the RB tree functionality). This simultaneously solves the count problem and the access element # N problem. The O(log (base 2) N) lookup time will have a higher fixed cost per level traversal, but if you start getting really large dbs (1000's to 10k's?) then it will certainly beat a linear map-index approach. i.e.

http://en.wikipedia.org/wiki/Red-black_tree

There is a lisp example of this data structure here:

http://www.aviduratas.de/lisp/progs/rb-trees.lisp

Now there is a problem that you'll need one of these for each sorted order which for a list sorted many different ways is a problem. Anyone know how SQL query systems implement this?

Just remember that premature optimization is one of the four horseman of the apocalypse for the effective programmer.

Ian
----- Original Message -----
From: Mariano Montone
To: Elephant bugs and development
Sent: Wednesday, October 03, 2007 6:57 PM
Subject: [elephant-devel] Collection paging

Hello, it's me again :S.

I would like to know how I can access persistent collection pages efficiently.

What I'm trying to do is making work a web list component with elephant. The list component is supposed to support well known navigation commands, like look at the collection in pages, support for first, last, next, previous buttons, and display of collection size.

The collection size problem was treated here: http://common- lisp.net/pipermail/elephant-devel/2007-October/001162.html.

But now I have a problem with building the pages.

My first try was:
  (let*
      ((start (* (current-page self) (page-size self)))
       (end (+ start (page-size self)))
       )
        (<:ul
(elephant:map-btree #'(lambda (key elem) (declare (ignore key))
                       (let ((elem-text (make-elem-text self elem)))
                         (<:li
                          (if (slot-value self 'selectable)
(<ucw:a :action (answer elem) (<:as- html elem-text))
                          (<:a (<:as-html elem-text))))))
                 (model self) :start start :end end)
         )

with start and end previously fixed based in the current page number and size.

But I realized indexes were not sequential when I created new objects, as this shows:

ASKIT> (with-btree-cursor (cursor (find-class-index 'user))
  (iter
    (for (values exists? k v) = (cursor-next cursor))
    (while exists?)
    (format *standard-output* "~A -> ~A ~%" k v)))
2 -> #<USER name: dssdf {B043379}>
3 -> #<USER name: ttttt {B045C69}>
5 -> #<USER name: ff {B048179}>
6 -> #<USER name: other {B04A451}>
7 -> #<USER name: guest {AD61271}>
100 -> #<USER name: qqq {B053001}>
101 -> #<USER name:  {B055721}>
102 -> #<USER name:  {B057E01}>
103 -> #<USER name:  {B05A529}>
104 -> #<USER name:  {B05CCF1}>
105 -> #<USER name:  {B05F579}>
106 -> #<USER name:  {B063E91}>
107 -> #<USER name: qqq {B066851}>
200 -> #<USER name:  {B069519}>
201 -> #<USER name:  {B06C009}>
300 -> #<USER name:  {B06EBA1}>
301 -> #<USER name: aaa {B0717D1}>
NIL

I don't think this is a bug, it must have to do with how Elephant manages btrees; but then how am I supposed to access through pages? I would like to have to access all the objects from the beggining just to discard them instantly (imagine a large collection and the user wanting to see the last page).

Thank you again :)

Mariano


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to