Robert makes an excellent point. For datasets that fit in memory, caching objects and slot values in memory makes the use of lisp as a query language really easy.

Another (unreleased) prevalence-like facility in Elephant:

In src/contrib/eslick/snapshot-set.lisp is a simple object caching model that works for non-persistent object. It allows you to register objects with a special hash as 'root' objects. This hash can be saved and restored and it stores the root objects and all objects 'reachable' from the root set. The notion of reachable can be overloaded but now it's defined recursively for any standard object or hash in a slot of a reachable object. The whole snapshot-set concept is about 300 lines of code, so pretty easy to read as an example.

A potential proposal:

It's also fairly easy to add a special cached-persistent-slot which caches its values and implements a write-through policy. This allows you to keep all your slot-accesses in memory (making object-based search very efficient) but still exploit on-disk BTrees for indexing when you need to.

You'd have to think through the implications of this strategy, though. It works great if your data is read-only or only operated on in one thread. If you can handle some in-coherence (the slot value can be changed at any time) in your read-oriented algorithms then you can ignore threading issues.

(Hmmmm...one hack might be to force a database read of cached slots when you are in a transaction so you can guarantee that any writes to that page in a parallel transaction result in a restart. If you are just doing auto-commit, the read is to the cached value).

Ian

On Mar 6, 2008, at 10:02 PM, Robert L. Read wrote:

On Thu, 2008-03-06 at 10:10 -0500, Ian Eslick wrote:
I agree with Robert.  The best way to start is to use lisp as a
query
language and essential do a search/match over the object graph.

The rub comes when you start looking at performance.  A linear scan
of

I neglected to mention that in my use of Elephant, when I was attempting to run a commercial website, I was using the Data Collection Management
(DCM) stuff that you can find in the contrib/rread directory of the
project.

This system provides strategy-based directors.  That is, there is a
basic factory object for each collection of objects that implements
basic Create, Read, Update, Delete operations.

When you initialize a director, you specify a storage strategy:

*) In-memory hash, (no persistence, for transient objects)
*) Elephant (no caching)
*) Cache backed by Elephant (read in memory, with writes immediately
flushed to the store)
*) Generational system, in which each generation can have its own
storage strategy.

Everything Ian wrote in the last email about scanning and locality of
reference makes perfect sense, but is assuming that you don't have every object cached. That approach is therefore not very "Prevalence"- like in
its performance, but is very "Prevalence"-like in its convenience.
Using DCM, or any other caching where most of the object are cached,
tends to you go the performance described in the IBM article on
Prevalence that I referenced.

However, DCM was written BEFORE Ian got the class indexing and
persistence working.  DCM is not nearly as pretty and clean as the
persistent classes.  You end up having to make storage decisions
yourself.

A perfect system might be persistent classes with really excellent
control over the caching/write-updating policy.

For any application, I a would recommend using Ian's persistent classes
at the beginning project stages, and then when your performance tests
reveal you have a problem, consider at that point whether to add
indexes, move to explicitly keeping a class in memory, or some other
solution.



_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to