On Fri, 2008-05-16 at 11:26 -0400, Ian Eslick wrote: > Prevalence is much, much faster because you don't have to flush data > structures on each commit, so cl-prevalence performance with > Elephant's data and transaction abstractions would be a really nice > design point. I wonder if we would get some of this benefit from a > Rucksack adaptation?
I'd like to take this opportunity to explain something. We already have a "prevalence-ish" auxilliary system in the contrib/rread/dcm directory. This implements what you might call "prevalence"-style caching --- everything is stored in a hashtable, and any writes or additions are written immediately to Elephant (which ends up doing the write I/O) before control is returned. It's also thread-safe, I think. I wrote it right after I got the CL-SQL backend working. That is in fact part of the reason that I never worried about making the CL-SQL backend faster -- the caching took care of almost all of my needs. I didn't mind paying for the writes, since each one typically was in response to a human being clicking a browser button, and the writes were certainly faster than that. I called this system "DCM" for Data Collection Managment. In fact it implements what you might call a Tier 2 or "Business Object" cache. It writes objects directly to btrees, and creates its own keys. In a way, it does what Ian's class-based persistence does. However, I haven't touted because it isn't very good. It has the following drawbacks: 1) I was relatively new to LISP when I wrote it, 2) It makes no use of Ian's new stuff (which is newer than DCM), 3) It uses SBCL-specific locking, 4) It does not have a lot of tests, 5) It does not allow limitations to the cache size --- it assumes you have enough memory for those classes for which you use the in-memory caching strategy, 6) I think its use of btrees could be pretty bad, 7) it might not run now, as it has not been under test for a while. However, the one thing it does really well, that I don't think we have at the current level, is object-level caching. (If I am wrong about that, please enlighten me.) I haven't looked at Rucksack yet, but I would venture that if we wanted to write a Prevalence-style system, we could examine DCM and either improve it or take some of its approach as a starting point. One way in which this would differ from Prevalence is that this a "write-through cache", not a "journaling transaction system." A true journaling system would have less I/O and would allow more control of checkpointing strategy. One stylistic point that I'm undecided on is which of these styles is better: 1) You have an explicit "manager" or "director" that is responsible for Create, Read, Update Delete (CRUD) operations on a class of objects. The managed class is not itself inherently persistent; it is persisted when you call "register" on the managed object via the "manager". When you instantiate a manager, you specify a caching strategy by subclassing a manager class that follows a particular strategy. 2) You use the MOP to make a class really intelligent, and let it be re-defclassed with different settings, and you implement lots of slot keywords to say which slots are transient, persistent, etc., and in general think of the "class", when treated as an actual data value as it is in lisp, as responsible for caching and persistence, thus doing the same job that the "manager" does in the other model. What Elephant has now is the latter; DCM is the former. DCM will be familiar to your typical Java/C# software engineer. Given all of the discussion around performance, it is hard for me to personally sort out how important performance really is, and how best to get it, not because it is particularly hard, but because we don't seem to have anybody with an urgent use-case, and because object-level caching is so effective (at least for me.) So I'll go out on a limb and say that offering object-level caching is the single biggest performance enhancement we make for the most common cases. If we agree with that, then we can start to imagine how we would most like to implement this. I personally don't have any objection to the "manager" pattern as a discrete object. However, I think most of our users are happier with the "defpclass" approach. So I think the ideal situation would be to expand the "defpclass" macro to allow one to specify a caching strategy, and the parameters that control such a strategy. Whether this should be slot-based or object based or some combination, I'm not sure. It ought to be quite simple to create some performance tests that clarify the performance of this approach. However, I don't know if this is more important than a native-lisp backend, or a query-language. For the next year at least I am working at a job rather than working on my lisp application; and even then I was happy with the performance I was getting out of DCM. So I personally don't have performance need that drives anything. I wish I knew how many new users we would have from better performance vs. a native-lisp backend vs. a query-language, or what our existing users would prefer. _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/elephant-devel