Re: [elephant-devel] Collection paging

Ian Eslick Thu, 04 Oct 2007 09:21:43 -0700

When you get that 2 second result, have you declared an index on thefields you are sorting on or is this a full sort + count + offset query?

At least from what I can tell of the PostgreSQL implementation ofSQL, the optimizer doesn't keep accurate counts for the underlyingtable so always has to do a table scan to compute counts. So whatseems to be happening is a very efficient full index scan and thenperhaps some way of combining the sort with an accumulating countwith a terminate at offset to bypass doing the full merge (although Idoubt it's that efficient:).

One thing we can do in elephant is to avoid deserializing the valuepart of key-value pairs and just operate on the keys (for index scansor linear count scans). This will help keep the lisp GC memory fromthrashing too much.


Ian

On Oct 3, 2007, at 11:07 PM, [EMAIL PROTECTED] wrote:

I have to say that Mariano is hitting some of the issues we will befacing soon as our quest to learn Lisp and Elephant continues andwe continue working on migrating some of our SQL-based applicationsover. This particular need of his is also a real need we have sinceit's something we offer to our application users. For example, inthe application we are working on migrating, we have a table withover 7 million rows. This obviously has many thousands of 50-rowpages to navigate through. Our user interface offers the "usual"search, sort by any column(s), and page navigation (First, Last,Next, Previous, or manually input a page number).
The way we handle this in our code is something like SELECT COUNT(*) FROM table_name. From the count, figure out the number ofpages. Then compute an offset based on the page the user wants toview (e.g. assuming 50 rows per page and wanting to view page 90,the offset would be (50 * 90) - 1 = 4499) and formulate the SQLquery as something like SELECT * FROM table_name ORDER BY{sort_order} OFFSET {computed_offset} LIMIT 50 (note that all thisassuming a 50-row page size, and the user also has the ability tochange the page size via the web interface)
From a SQL data manipulation language perspective, it's prettystraight forward. From a SQL internal execution path, I really haveno idea how it's implemented and don't know if it does any linearscanning to return the results. The fact is that our applicationallows you to navigate through the 7+ million row table in under 2seconds per page no matter which page you wish to view or sortorder. From a user perspective, 2 seconds for a browser-basedscreen refresh is more than acceptable. Will Elephant allow to"refresh" as quickly if in the current model it needs to do alinear scan? We haven't gotten there yet, but maybe someone cancomment on that.
Thanks

On Oct 3, 2007, at 9:13 PM, Ian S Eslick wrote:
When you say indexes are not sequential, do you mean UIDs are notsequentially allocated? I think there is a BDB sequence issuethat I've never worried about that jumps to the nearest 100 whenyou reconnect. However, if you create anything other than a userobject, you will also have gaps in the UID sequence so that's afundamental issue. Don't assume anything about UIDs other thanthe fact that they are unique.
You could create and index your own field which is a sequential IDfor creation ordering, but it sounds like you probably want toreturn a sublist based on some sort order like alphabetical byname or by date. In this case, at least doing the last page iseasy, map from end and count the # of users you want before youterminate, but to find an element that is N elements away from thefirst or last element in less than O(n) time isn't possible withthe underlying B-Trees we're using.
The first question is whether you database is guaranteed to be sobig that you can't just do this linear time. When you start toface performance issues, then you can look at building thatadditional data structure.
Otherwise, you will have to implement a data structure thatmaintains this information on top of the Elephant infrastructure.
The first idea that occurs to me is to drop the idea of using anindexed class or standalone btrees and just build a red-black treeusing object slots (you can inherit from a base class thatimplements the RB tree functionality). This simultaneously solvesthe count problem and the access element # N problem. The O(log(base 2) N) lookup time will have a higher fixed cost per leveltraversal, but if you start getting really large dbs (1000's to10k's?) then it will certainly beat a linear map-index approach.i.e.
http://en.wikipedia.org/wiki/Red-black_tree

There is a lisp example of this data structure here:

http://www.aviduratas.de/lisp/progs/rb-trees.lisp
Now there is a problem that you'll need one of these for eachsorted order which for a list sorted many different ways is aproblem. Anyone know how SQL query systems implement this?
Just remember that premature optimization is one of the fourhorseman of the apocalypse for the effective programmer.
Ian
----- Original Message -----
From: Mariano Montone
To: Elephant bugs and development
Sent: Wednesday, October 03, 2007 6:57 PM
Subject: [elephant-devel] Collection paging

Hello, it's me again :S.
I would like to know how I can access persistent collection pagesefficiently.
What I'm trying to do is making work a web list component withelephant. The list component is supposed to support well knownnavigation commands, like look at the collection in pages, supportfor first, last, next, previous buttons, and display of collectionsize.
The collection size problem was treated here: http://common-lisp.net/pipermail/elephant-devel/2007-October/001162.html.
But now I have a problem with building the pages.

My first try was:
  (let*
      ((start (* (current-page self) (page-size self)))
       (end (+ start (page-size self)))
       )
        (<:ul
(elephant:map-btree #'(lambda (key elem) (declare (ignorekey))
                       (let ((elem-text (make-elem-text self elem)))
                         (<:li
                          (if (slot-value self 'selectable)
(<ucw:a :action (answer elem) (<:as-html elem-text))
                          (<:a (<:as-html elem-text))))))
                 (model self) :start start :end end)
         )
with start and end previously fixed based in the current pagenumber and size.
But I realized indexes were not sequential when I created newobjects, as this shows:
ASKIT> (with-btree-cursor (cursor (find-class-index 'user))
  (iter
    (for (values exists? k v) = (cursor-next cursor))
    (while exists?)
    (format *standard-output* "~A -> ~A ~%" k v)))
2 -> #<USER name: dssdf {B043379}>
3 -> #<USER name: ttttt {B045C69}>
5 -> #<USER name: ff {B048179}>
6 -> #<USER name: other {B04A451}>
7 -> #<USER name: guest {AD61271}>
100 -> #<USER name: qqq {B053001}>
101 -> #<USER name:  {B055721}>
102 -> #<USER name:  {B057E01}>
103 -> #<USER name:  {B05A529}>
104 -> #<USER name:  {B05CCF1}>
105 -> #<USER name:  {B05F579}>
106 -> #<USER name:  {B063E91}>
107 -> #<USER name: qqq {B066851}>
200 -> #<USER name:  {B069519}>
201 -> #<USER name:  {B06C009}>
300 -> #<USER name:  {B06EBA1}>
301 -> #<USER name: aaa {B0717D1}>
NIL
I don't think this is a bug, it must have to do with how Elephantmanages btrees; but then how am I supposed to access through pages?I would like to have to access all the objects from the begginingjust to discard them instantly (imagine a large collection and theuser wanting to see the last page).
Thank you again :)

Mariano


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] Collection paging

Reply via email to