When you get that 2 second result, have you declared an index on the
fields you are sorting on or is this a full sort + count + offset query?
At least from what I can tell of the PostgreSQL implementation of
SQL, the optimizer doesn't keep accurate counts for the underlying
table so always has to do a table scan to compute counts. So what
seems to be happening is a very efficient full index scan and then
perhaps some way of combining the sort with an accumulating count
with a terminate at offset to bypass doing the full merge (although I
doubt it's that efficient:).
One thing we can do in elephant is to avoid deserializing the value
part of key-value pairs and just operate on the keys (for index scans
or linear count scans). This will help keep the lisp GC memory from
thrashing too much.
Ian
On Oct 3, 2007, at 11:07 PM, [EMAIL PROTECTED] wrote:
I have to say that Mariano is hitting some of the issues we will be
facing soon as our quest to learn Lisp and Elephant continues and
we continue working on migrating some of our SQL-based applications
over. This particular need of his is also a real need we have since
it's something we offer to our application users. For example, in
the application we are working on migrating, we have a table with
over 7 million rows. This obviously has many thousands of 50-row
pages to navigate through. Our user interface offers the "usual"
search, sort by any column(s), and page navigation (First, Last,
Next, Previous, or manually input a page number).
The way we handle this in our code is something like SELECT COUNT
(*) FROM table_name. From the count, figure out the number of
pages. Then compute an offset based on the page the user wants to
view (e.g. assuming 50 rows per page and wanting to view page 90,
the offset would be (50 * 90) - 1 = 4499) and formulate the SQL
query as something like SELECT * FROM table_name ORDER BY
{sort_order} OFFSET {computed_offset} LIMIT 50 (note that all this
assuming a 50-row page size, and the user also has the ability to
change the page size via the web interface)
From a SQL data manipulation language perspective, it's pretty
straight forward. From a SQL internal execution path, I really have
no idea how it's implemented and don't know if it does any linear
scanning to return the results. The fact is that our application
allows you to navigate through the 7+ million row table in under 2
seconds per page no matter which page you wish to view or sort
order. From a user perspective, 2 seconds for a browser-based
screen refresh is more than acceptable. Will Elephant allow to
"refresh" as quickly if in the current model it needs to do a
linear scan? We haven't gotten there yet, but maybe someone can
comment on that.
Thanks
On Oct 3, 2007, at 9:13 PM, Ian S Eslick wrote:
When you say indexes are not sequential, do you mean UIDs are not
sequentially allocated? I think there is a BDB sequence issue
that I've never worried about that jumps to the nearest 100 when
you reconnect. However, if you create anything other than a user
object, you will also have gaps in the UID sequence so that's a
fundamental issue. Don't assume anything about UIDs other than
the fact that they are unique.
You could create and index your own field which is a sequential ID
for creation ordering, but it sounds like you probably want to
return a sublist based on some sort order like alphabetical by
name or by date. In this case, at least doing the last page is
easy, map from end and count the # of users you want before you
terminate, but to find an element that is N elements away from the
first or last element in less than O(n) time isn't possible with
the underlying B-Trees we're using.
The first question is whether you database is guaranteed to be so
big that you can't just do this linear time. When you start to
face performance issues, then you can look at building that
additional data structure.
Otherwise, you will have to implement a data structure that
maintains this information on top of the Elephant infrastructure.
The first idea that occurs to me is to drop the idea of using an
indexed class or standalone btrees and just build a red-black tree
using object slots (you can inherit from a base class that
implements the RB tree functionality). This simultaneously solves
the count problem and the access element # N problem. The O(log
(base 2) N) lookup time will have a higher fixed cost per level
traversal, but if you start getting really large dbs (1000's to
10k's?) then it will certainly beat a linear map-index approach.
i.e.
http://en.wikipedia.org/wiki/Red-black_tree
There is a lisp example of this data structure here:
http://www.aviduratas.de/lisp/progs/rb-trees.lisp
Now there is a problem that you'll need one of these for each
sorted order which for a list sorted many different ways is a
problem. Anyone know how SQL query systems implement this?
Just remember that premature optimization is one of the four
horseman of the apocalypse for the effective programmer.
Ian
----- Original Message -----
From: Mariano Montone
To: Elephant bugs and development
Sent: Wednesday, October 03, 2007 6:57 PM
Subject: [elephant-devel] Collection paging
Hello, it's me again :S.
I would like to know how I can access persistent collection pages
efficiently.
What I'm trying to do is making work a web list component with
elephant. The list component is supposed to support well known
navigation commands, like look at the collection in pages, support
for first, last, next, previous buttons, and display of collection
size.
The collection size problem was treated here: http://common-
lisp.net/pipermail/elephant-devel/2007-October/001162.html.
But now I have a problem with building the pages.
My first try was:
(let*
((start (* (current-page self) (page-size self)))
(end (+ start (page-size self)))
)
(<:ul
(elephant:map-btree #'(lambda (key elem) (declare (ignore
key))
(let ((elem-text (make-elem-text self elem)))
(<:li
(if (slot-value self 'selectable)
(<ucw:a :action (answer elem) (<:as-
html elem-text))
(<:a (<:as-html elem-text))))))
(model self) :start start :end end)
)
with start and end previously fixed based in the current page
number and size.
But I realized indexes were not sequential when I created new
objects, as this shows:
ASKIT> (with-btree-cursor (cursor (find-class-index 'user))
(iter
(for (values exists? k v) = (cursor-next cursor))
(while exists?)
(format *standard-output* "~A -> ~A ~%" k v)))
2 -> #<USER name: dssdf {B043379}>
3 -> #<USER name: ttttt {B045C69}>
5 -> #<USER name: ff {B048179}>
6 -> #<USER name: other {B04A451}>
7 -> #<USER name: guest {AD61271}>
100 -> #<USER name: qqq {B053001}>
101 -> #<USER name: {B055721}>
102 -> #<USER name: {B057E01}>
103 -> #<USER name: {B05A529}>
104 -> #<USER name: {B05CCF1}>
105 -> #<USER name: {B05F579}>
106 -> #<USER name: {B063E91}>
107 -> #<USER name: qqq {B066851}>
200 -> #<USER name: {B069519}>
201 -> #<USER name: {B06C009}>
300 -> #<USER name: {B06EBA1}>
301 -> #<USER name: aaa {B0717D1}>
NIL
I don't think this is a bug, it must have to do with how Elephant
manages btrees; but then how am I supposed to access through pages?
I would like to have to access all the objects from the beggining
just to discard them instantly (imagine a large collection and the
user wanting to see the last page).
Thank you again :)
Mariano
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel