Robert said:

> I'll go out on a limb and say that offering object-level caching is
> the single biggest performance enhancement we make for the most common
> cases.

A clarifying question. How did you ensure ACID properties in the DCM scenario in the presence of threading? Without letting BDB or sql know about the reads that you've done, you can't tell if a prior transaction has clobbered on data that you are currently using because the reads are directly from memory.

e.g. You can easily read the old 'balance' on the checking account, do your computing while someone else has written that same object, then write back an incorrect value.

Rucksack tracks changes by versioning objects in memory and rolling back newer versions when older versions are committed. This is a copy- on-write model which keeps everything in memory during the transaction, but then writes the txn log and a version of the object to disk, updating the in-memory 'valid' version as appropriate.

Leslie had a good related e-mail on this topic a few days ago:

I don't know what the best decision might be here.
But I have a use case that might help; it has the following
features:

 * I access the slots of two persistent objects.

 * The number of the slots and the times requested
   together produce very bad performance (think seconds)
   even with PM txn caching (for comparison, BDB is about
   three times faster)

 * The environment is multi-threaded (web server), but the
   slots won't be changed by any other process.

 * Ideally the slots would be cached only for this one
   function and the functions called by it (and only
   per-invocation, i.e. slot caches get refreshed right at
   the beginning of the function).

 * This is currently the only place in my app where I would
   need the performance advantages of slot caching. In all
   other places ACID is highly preferred and speed is sufficient.

 * The desired behaviour can be somewhat modelled by CLSQL's
   OO interface:

     - get the objects from the DB at the beginning

     - work with those in-memory objects

     - write back the values to the DB at the end of the process

   The difference is that I don't want the whole object (other slot
   values of it might be changed from outside!) but only a few
   selected slots.

I think we can basically do this today. A refresh command simply reads from the DB for all cached slots (in a transaction this is thread safe and avoids the aforementioned problem). You operate on the cached data, nothing happens in the transaction, at the end you do a 'save' and those cached slots get written to disk. I think this meets leslie's use case and I think it's an hour or two to implement on top of what is already there.

> However, I don't know if this is more important than a native-lisp
> backend, or a query-language. For the next year at least I am working > at a job rather than working on my lisp application; and even then I was
> happy with the performance I was getting out of DCM.  So I personally
> don't have performance need that drives anything.  I wish I knew how
> many new users we would have from better performance vs. a native- lisp > backend vs. a query-language, or what our existing users would prefer.

My two dollars on this topic is that the most interesting thing to improve adoption and overall utility is a lisp-only backend to get going with. The most interesting value to the current users, including myself, is a query system that manages and abstracts some of the performance query hacks that today you have to write yourself in lisp, often over and over.

I think of the query system, by the way, as a DSL (domain specific language) extension of lisp, not a SQL syntax. So it's not an either or, it's exactly what Lisp was meant to do, enable linguistic abstraction that makes thinking about a given problem easier. That's what I think when I hear 'lisp as the query language'.

Rucksack strikes me as the best way to start on the lisp-only front, because so much is there. It's a non-trivial port/adaptation so someone needs to be willing to put in a week or two (at least) of serious effort.

I think we may also be able to change it so that it only writes a transaction log and doesn't write the underlying DB unless something is flushed from the cache. What I like about Rucksack for a more prevalence style model (and maybe I'm misreading this and it's not flushing objects to disk on each write) is that it already implements versioning as its transaction model, which gets around fine-grained locking performance problems. If we add in Robert's DCM ideas about having a cache instead of the whole DB in memory, then we could imagine writing flushed objects to disk and effectively incrementally syncing the memory objects to disk rather than having to do a full snapshot every so often.

Regards,
Ian





Ian


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Reply via email to