First off, let me appologize; I should have given way more context in my original post. boost members are not psychic. :-)
On Tue, Nov 12, 2002 at 05:31:46PM +0200, Bohdan wrote: > > "Wesley W. Terpstra" <[EMAIL PROTECTED]> wrote in message > news:20021112113226.GA466@;ito.tu-darmstadt.de... > > Good afternoon! > > > > I am looking at making an stl-compatible wrapper around a key-value database. > > > > It seems to me that such a wrapper would be widely useful since: > > 1. stdc++ algorithms could operate on the databases > > 2. switching a map<...> that grew too large to disk-backed becomes trivial > > 3. old reliable stl code could be reused on disk > > 4. a very gentle learning curve to existing C++ developers > > 5. it would be highly convenient to use > > 6. quite likely clever (ab)uses which I do not foresee would be possible I am answering your message out of sequence because I think it will be more clear. > The other idea is that transaction object is needed here. Yes, I already have that solved: You get a transaction object from the environment. You get a database object from the environment. You combine the two to get a session object. The session looks like an stl map (it issues iterators). When you commit the transaction all the sessions using that transaction are invalidated as are all the iterators that they issued. ... the problem I have is the calling of member methods through the iterator. I know I could do it with a several approaches, I am trying to find the most elegant with minimal overhead. > ObjectDatabases are very painful things. Unfortunately > they are not too popular nowadays. The reason for > this is simple, they are extrimally difficult to implement > and use (at least for c++). Yes they are. Fortunately, I am not trying to write one. I have a *very* narrow functionality target. When you put it in the container, then it is on disk and in my control. When you take it out, it is not. > > Obviously, any scheme like this would require serialisation of > > the key/data pairs. My solution thus far has been to include a > > SerialTraits<T> concept which provides a conversion method. Then > > the databases look like: MapDatabase<KeyTraits, DataTraits> db; > > You can use new boost::serialization library. I have considered this and rejected it (the coupling, not boost::serialization). I would rather leave this entirely in user control. It would be a matter of adding a single template class if they wished to bridge the two themselves, so there is no functionality lost. Further, it is important to control the serialized format so that the lexical sort order has desirable properties. This fine grained control is not available under boost::serialization without writing a stream object for each desired lexical-sort seriailization. Also, this way I depend on less things. > I have doubts if you can use std::map interface for your database class. I agree that it is not designed for this purpose, but I think there are many beneficial emergent properties. It does not actually have to exactly conform; just conform with the subset used in existing practice. I know from experience with a previous product that even something that closely approximates a std::map is highly useful. I just want to bring that approximation a bit closer so that code really can ignore the difference. > > Although one might niavely claim mmap() could keep the representation of > > RAM on the disk, I would disagree since this would still impose arcane > > restrictions on the class member variables. > > Well, there are two ways: > 1. You need disk to reduce memory usage. > 2. You need disk to persist objects. > I'm not sure which one is yours. Did you ? Did I ? I desire both of the two properties, you want me to choose? :-) Is your comment here pertaining to mmap()? > > This would unfortunately break any code which took a T& or T* > > from an iterator and held on to it. > > If you want to allow pointers and use them after application restart > than use smart pointers: > > I've heard something about some system/processor tricks which allow > to persist pointers, but i do not think it is good way. You are considering serialization. I have no interest in this topic. My implementation presumes that you have picked one of the many available serialization tools or rolled your own. This is especially not-so-important because I do not plan on supporting storing anything other than by-value. Further, since it looks like the stl I know all the type information and there is no polymorphism. The T& and T* I was refering to are those obtained from the map::iterator class that is walking records. The user might dereference this iterator and take a reference to the object. Rather than telling them to use a smart pointer, better would be to say: just keep the iterator! I am just concerned with breaking existing stl code. If possible, I would rather this could work, but I don't see how. > > I am duplicating the read cache of the database in a wrapper. > > (on the plus side I am also saving deserialisation work) > > I'm not sure if you can live with read-only cache. When you are > doing intensive changes than cache should also support changes. > i do not see problems with it. Sorry, I was not clear. Of course my cache is read-write, but I am reducing the usefulness of the databases read cache. > Imagine that serializing of some object can throw exeption ... Again, serialization... Not my problem. Even if it threw an exception, I do not think this would pose any problems; your transaction should have been wrapped in a retry block. All the memory handed out from my database is memory managed in a way resiliant to exceptions (since I throw them too! deadlock for instance). > > Solution #3: fuzzy template+inheritance tricks! > > > > I figure there might be some clever way to return an object which > > looks like the data object, but really is not. Maybe by inheriting a > > template class from the contained class. Returning these might be > > able to do what is needed; eg: on destruction, write back to the > > database library. > > ODMG proposes new/delete operators for persistent object creation/destruction. > I this case you can construct your object is somewhat bigger memory chunk, > which can contain some other implementation specific per-object information. I do not think this will help; the returned object would be living on the stack which puts control of it's new/delete out of my hands. However, with my trick (deriving from the class) I can still catch the destruction even though it is on the stack. > > This seems like a good idea, but it is fraught with complications. > > Consider two iterators i&j which (happen by chance to) point to the > > same object. > > i->set_a(j->set_b(4) + 2); > > > > Oops. You would expect both changes to work since they are > > presumably modifying different member variables. However, since > > i-> and j-> both read from disk and deserialised to a temporary, > > we are modifying two different temporaries. Therefore only one of > > the changes (whoever's object destroys last) will be made. > > If your class is going to have concurrent transactions, than your > example shows how concurrency works. These iterators are in the same transaction. > if i & j belong to same transaction (or no transactions) than > assert(&(*i)==&(*j) ). It can be implemented if MapDatabase::iterator is > smart enough "smart pointer" :) Well if it is a smart pointer than you are using solution #2: cache. But this is solution #3, I was trying to avoid deserialized cache and only use the database's underlying sector-oriented cache. So, the same object got deserialized twice and returned by value on the stack twice. > So if you want to hear my humble opinion: > try to implement something small and limited in use first. Well, this is my third attempt. :-) I implemented a stl-like wrapper for libdb3 at a place I worked, but I could not keep that code. It took approach "#1: don't do that!" to calling methods through the iterator. After that, I wrote a custom database without a nice api because I had a very specific use which I could not get fast enough with btrees and hashes. It was not transactional, however, and I later discovered an even better layout (nearly 0 seeks!). This newest attempt is a fully transaction database with several unusual backing data structures. I have a nice raw iterator class that operates on memory chunks for the key&data. It is mostly complete, but lacks a user-friendly api. I am trying to get it to look more like the stl because the wrapper I wrote for my previous employer was invaluable. And all the reasons at the top. Thanks for your time! _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost