Hi All, I was inspired the other day to come up with a minimal, quick-and- dirty, all-lisp data store for Elephant based on cl-prevalence. The problem with cl-prevalence is that every access to a persistent slot has to be explicitly transactionally protected so it can be recovered. This is onerous and violates the abstractions provided by the rest of the elephant data stores.
The general idea is to hide the cl-prevalence transaction model inside the existing meta-object protocol. Instead of the usual hash table, we use trees from cl-collections as our replacement for btrees and indexes - this gives us reasonably efficient successor/predecessor ops and makes it easier to implement the mapping and cursor APIs. It's not going to be nearly as fast as a full prevalence solution, but it will be faster than BDB and should be much easier to install and work with on a single-image basis. The new db-clp store requires a patches to cl-prevalence and cl-containers. I'll see about getting those put into the mainstream, but can send them to anyone interesting in hacking on this in the meantime. I checked a prototype of this into elephant-1.0 today; it does not interfere with existing functionality at all and I'm not sure yet whether it will be part of 1.0. You can open a new store, subject to the following caveats, by: In elephant root: ln -s src/contrib/eslick/db-clp/ele-clp.asd ele- clp.asd (open-store '(:CLP "/home/me/db/")) where /home/me/db/ is a fresh directory. 85%+ of the tests pass and a ton of stuff does work: persistent classes, btrees, dup-btrees, indexed-btrees, mapping (mostly), and class indexing (mostly). However, there are still some very serious holes. The ones I'm aware of are: - You can only create, but not re-open a store (this is due to a bootstrapping problem in recreating persistent instances when loading a snapshot) - No cursors are supported (all those tests fail), but should be easy as there is a good match between the RB tree and the btree abstractions but you will need to add a couple of functions to cl- containers. - On-line recovery has not been tested. - I'm unsure of how cl-prevalence guarantees global transaction serializability - I don't see locks anywhere in the code. - I faked out a bunch of serializer tests by including the default serializer, because they depend on buffer streams which the xml serializer doesn't support; the tests should be fixed to be more general and not rely on buffer streams. - Some issues in schema evolution tests Performance issues: - Reads need to be serialized so are currently considered transactions for simplicity, but they write a transaction log - they should be rewritten to only use the serialization mechanism but not write a log and the grabbing of a lock during serialization should be done once per with-transaction call. - with-transaction is a no-op, a significant performance enhancement would be to bundle a set of primitive operations such as tx-write- slot, tx-remove-kv and write a single prevalence log entry for them. This would avoid a disk write per primop. - I started using splay trees, retreated to RB trees due to bugs and finally retreated to binary-search-trees due to more bugs. Moving to a more efficient, balanced tree data structure will improve the performance of all operations. However, several tests use linear insertion, reducing the asymptotic behavior of the tree to that of a list - it really grinds the tests to a halt. Obvious next steps, in order of increasing difficulty: - Implement cursor API - Fix red-black or splay tree implementation in cl-containers - Figure out how to load from a snapshot - Use with-transaction to improve performance - Read transaction optimizations Regards, Ian _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/elephant-devel