Hi, this discussion makes me think that we should build MVCC directly into XDBM. I think it should work independent of the underlying store (JDBM, AVL, HBase).
Let me outline my idea: Instead of MasterTable<ID, Entry> we use a MasterTable<ID, SortedMap<Long, Entry>>. It stores multiple versions of the entry in a sorted map, the key of the sorted map is the version number. Very same for index tables. We need some global version information which contains - version counter (seqence or timestamp) - the latest valid version - list of writes in progress, and - list of failed writes At the beginning of a write or read operations we get a snapshot of this global version info. This snapshot is used to get a consistent view to the data for the whole operation. For each read data we use the version that is less or equal to the "latest" version, excluding versions contained in the "in progress" or "failed" list. For a search operation this snapshot can also be used by the cursor, while fetching all results there would always be a conistent view to the data. A write operation works as follows: - Aquire the next version number, the version number is added to the list of "writes in progess" (begin transaction). - All micro-writes to index and master table are performed with this version. - All write operations add data, deletes/drops result in the addition of an empty value <Version, null> (append-only). - For commit the version number is removed from the "writes in progress" list and the latest valid version is set (except it isn't already higher). - If the write fails then the version number is moved to the "failed writes" list. At some point we have to do some garbage collection and delete old versions. If the version is a timestamp this can be done by the next write by deleting versions older than X days. I think this way we can also support RFC 5808 LDAP Transactions. An big disadvantage may be performance issues as we always have to read and write all the versions from/to the underlying store. Thoughts? Kind Regards, Stefan
