Re: [elephant-devel] Representational Question

Ian Eslick Thu, 06 Mar 2008 19:32:27 -0800

Robert makes an excellent point. For datasets that fit in memory,caching objects and slot values in memory makes the use of lisp as aquery language really easy.


Another (unreleased) prevalence-like facility in Elephant:

In src/contrib/eslick/snapshot-set.lisp is a simple object cachingmodel that works for non-persistent object. It allows you to registerobjects with a special hash as 'root' objects. This hash can be savedand restored and it stores the root objects and all objects'reachable' from the root set. The notion of reachable can beoverloaded but now it's defined recursively for any standard object orhash in a slot of a reachable object. The whole snapshot-set conceptis about 300 lines of code, so pretty easy to read as an example.


A potential proposal:

It's also fairly easy to add a special cached-persistent-slot whichcaches its values and implements a write-through policy. This allowsyou to keep all your slot-accesses in memory (making object-basedsearch very efficient) but still exploit on-disk BTrees for indexingwhen you need to.

You'd have to think through the implications of this strategy,though. It works great if your data is read-only or only operated onin one thread. If you can handle some in-coherence (the slot valuecan be changed at any time) in your read-oriented algorithms then youcan ignore threading issues.

(Hmmmm...one hack might be to force a database read of cached slotswhen you are in a transaction so you can guarantee that any writes tothat page in a parallel transaction result in a restart. If you arejust doing auto-commit, the read is to the cached value).


Ian

On Mar 6, 2008, at 10:02 PM, Robert L. Read wrote:

On Thu, 2008-03-06 at 10:10 -0500, Ian Eslick wrote:

I agree with Robert.  The best way to start is to use lisp as a
query
language and essential do a search/match over the object graph.

The rub comes when you start looking at performance.  A linear scan
of

I neglected to mention that in my use of Elephant, when I wasattemptingto run a commercial website, I was using the Data CollectionManagement

(DCM) stuff that you can find in the contrib/rread directory of the
project.

This system provides strategy-based directors.  That is, there is a
basic factory object for each collection of objects that implements
basic Create, Read, Update, Delete operations.

When you initialize a director, you specify a storage strategy:

*) In-memory hash, (no persistence, for transient objects)
*) Elephant (no caching)
*) Cache backed by Elephant (read in memory, with writes immediately
flushed to the store)
*) Generational system, in which each generation can have its own
storage strategy.

Everything Ian wrote in the last email about scanning and locality of

reference makes perfect sense, but is assuming that you don't haveeveryobject cached. That approach is therefore not very "Prevalence"-like in

its performance, but is very "Prevalence"-like in its convenience.
Using DCM, or any other caching where most of the object are cached,
tends to you go the performance described in the IBM article on
Prevalence that I referenced.

However, DCM was written BEFORE Ian got the class indexing and
persistence working.  DCM is not nearly as pretty and clean as the
persistent classes.  You end up having to make storage decisions
yourself.

A perfect system might be persistent classes with really excellent
control over the caching/write-updating policy.

For any application, I a would recommend using Ian's persistentclasses

at the beginning project stages, and then when your performance tests
reveal you have a problem, consider at that point whether to add
indexes, move to explicitly keeping a class in memory, or some other
solution.


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] Representational Question

Reply via email to