Are running from HEAD or 0.6.0?  I'll answer the size question tonight (on 
travel today).  -Ian
Sent via BlackBerry from T-Mobile  

-----Original Message-----
From: Daniel Salama <[EMAIL PROTECTED]>
Date: Mon, 13 Nov 2006 13:43:03 
To:Elephant bugs and development <elephant-devel@common-lisp.net>
Subject: Re: [elephant-devel] Querying Advice [w/code example]

Ok.


I got Elephant to work again with SBCL on PPC. I guess I was still using BDB 
4.3 when 4.4 seems to be required. I couldn't find that anywhere in the docs.


I see what you're saying and can start to envision where this can go. I will 
keep playing and provide more feedback later today.


FYI, for curiosity purposes, I just ran (random-users 5000) on an empty 
database and it took 5 min 56 secs to complete. At the end, the %ELEPHANT file 
was 9MB in size and both log files where 10MB. That's kind of big considering 
what it does.


I then made the random-users2 function to look like:


(defun random-users2 (n)
  (setq *auto-commit* nil)
  (with-transaction ()
    (dotimes (x n)
      (let ((u (make-instance 
                'User
                :uname (format nil "user~A" x)
                :pword (random-password)
                :email (format nil "[EMAIL PROTECTED]" x)
                :fullname (format nil "~A~A ~A~A" (random-password) x 
(random-password) x)
                :balance (random 100))))
        (add-to-root x u))))
  (setq *auto-commit* t))


When I ran (random-users2 5000) I got a Berkeley DB Error: Cannot allocate 
memory, so I changed random-users2 to:


(defun random-users2 (n)
  (setq *auto-commit* nil)
  (start-ele-transaction)
  (dotimes (x n)
    (if (eq (mod x 1000)
            0)
        (progn
          (commit-transaction)
          (start-ele-transaction)))
    (let ((u (make-instance 
              'User
              :uname (format nil "user~A" x)
              :pword (random-password)
              :email (format nil "[EMAIL PROTECTED]" x)
              :fullname (format nil "~A~A ~A~A" (random-password) x 
(random-password) x)
              :balance (random 100))))
      (add-to-root x u)))
  (commit-transaction)
  (setq *auto-commit* t))


When I ran (random-users2 5000) on an empty database, this time it only 25 secs 
to complete. The %ELEPHANT file was still 9MB in size and both log files where 
also 10MB.


I suppose your code was not designed for performance and only for illustration 
purposes. Needless to say, knowing how to use transactions certainly helps and 
can dramatically affect application performance.


My only concern at this moment (which I also mentioned in another email) is the 
size of the data files. Whether or not that only reflects the persistent 
storage and not necessarily the memory footprint, I don't know. Therefore, if I 
loaded my database with 650,000 customer records, the data files will easily 
exceed 1GB of storage, and that's just one "table".


Thanks,
Daniel



On Nov 12, 2006, at 6:45 PM, Robert L. Read wrote:
 Dear Daniel and Team,
 
     I think the code below, which I have tested on SBCL, illustrated a typical 
problem that Daniel Salama introduces.  To paraphrase, you have a datatype 
(perhaps compound) which has a lot of slots; you have a GUI, perhaps web-based, 
that you use to both select or filter the large database, and to decide how to 
present sort the results.  I've written the below example as if you operating 
directly on the slots.  The fact that there are often intervening functions 
does not fundamentally change the problem.  (An example of this is storing a 
timestamp as an integer, but presenting it in a human-readable format.)
     SQL supports a powerful querying ability based on both selection and 
sorting.  One might think that this is an advantage of SQL; it is conventional 
reason that this is actually an advantage of using a relational database.  
However, since LISP treats functions as first-class citizens that can be 
constructed dynamically, you actually have a full Turing-complete capabilities 
in doing queries that SQL cannot match.  This same ability applies to sorting; 
you can sort on any lexical order that you can program.
     In practice, however, one doesn't always need this power.  More typically, 
a user will select fields that they want to use to filter the results (that is, 
construct a query from), and perhaps how they would like the results to be 
sorted.  I assume that you know how to interpret an HTTP query or a McClim user 
interface or something to associate the GUI with underlying functions.  (My 
personal framework has a way to do this, and UCW is probably the most common or 
famous way to do it now.)
     The code below generated 100 random "users".  The bare act of defining 
this class defines accessor-functions that we can use in dynamically 
constructed lists as below: (list 'username-of 'balance-of).  I have written 
very small functions that use such lists either to define define 
"lexicographic" sort orders based on the order of the functions within the 
list.  That is, the primary sort criteria is the first function in the list, 
but of that function is equal for two values, the next is used and so on.  If 
you load the below code and execute (show-off) several times I think you will 
see what I mean.  You can then see how easily you can change the list of 
functions that are either in the selector or the sort criteria. If this is a 
web-based app, these list will be generated from the http-query, which is 
generated by the user's clicks.
     That is a "columnar" based approach; but it one can do something similar 
but more powerful based on computed functions that aren't based on individual 
columns, but on the entire data element.  For example "find users whose 
username is equal to their password" cannot be done in this way --- but can be 
done by just using a function #(lambda (x) (equal (username-of x) (password-of 
x)).  SQL can do this --- but LISP can could use any function there, such as 
"find users who have both short usernames and passwords that can be cracked by 
routine-x".
     Instead of adding things to the root or the store-controller directly, one 
would generally prefer
 to use consistent classes:
 
http://common-lisp.net/project/elephant/doc/Persistent-Classes.html#Persistent-Classes:
 
<http://common-lisp.net/project/elephant/doc/Persistent-Classes.html#Persistent-Classes>
 
 
 This doesn't change the nature of the problem.  If you like, you can create an 
index on any slot in a very convenient way: 
http://common-lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices: 
<http://common-lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices>  
 This holds out the possibility of NOT having to iterate over the entire 
data-set, but rather honing in directly upon 
 matching values. (You can in fact create functional indexes based on any 
function at all in Elephant, which is something that SQL can't do conveniently, 
but the times you will need to do this are rare.)
 
 It would take maybe 20 minutes more coding to uses Ian's 
"get-instances-by-range" directly, and in a very efficient manner for 
performing the query (if the GUI elements correspond to the class's slots.)   
This would be very efficient; but of course you should not do this until you 
know that this is really the bottleneck in your system.  By using cursors, you 
can avoid reading the entire data into memory, and thus process huge datasets.
     However, one should note that Ian's code makes creating indices on slots 
zero-effort; but indexes always have overhead.  The real question is: when will 
your queries actually utilize the index?  (That is, if you always select on one 
column/slot, then that one should be indexed....but if your query pattern is 
more complicated, it becomes fuzzy.)
     Let me know if you find this useful;  after I get feedback from you and 
Ian has made his post, perhaps we will put
 this in the documentation.
 
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to