After a second thought, it might be simpler to use 2 tables to store the
references to the current BTrees and to the modified btrees. Here is a a
small diagram exposing the data structure needed to manage the atomic
updates of btrees :
Mavibot header
+-------------------------------------+
| PageIO size |
+-------------------------------------+
| number of managed btrees |
+-------------------------------------+
| offset of the current Btree offsets |--+
+-------------------------------------+ \
| offset of the new BTree offsets |----)--------------+
+-------------------------------------+ / |
| |
+-------------------------+ |
| |
V V
+-------------------------+ +-------------------------+
| next Offset |---+ | next Offset |---+
+-------------------------+ | +-------------------------+ |
| size | | | size | |
+-------------------------+ | +-------------------------+ |
| +---------------------+ | | | +---------------------+ | |
| | first Btree offset | | | | | first Btree offset | | |
| +---------------------+ | | | +---------------------+ | |
| | second BTree offset | | | | | second BTree offset | | |
| +---------------------+ | | | +---------------------+ | |
| . | | | . | |
| . | | | . | |
| . | | | . | |
| +---------------------+ | | | +---------------------+ | |
| | Nth Btree offset | | | | | Nth Btree offset | | |
| +---------------------+ | | | +---------------------+ | |
+-------------------------+ | +-------------------------+ |
| |
+----------------+ +----------------+
| |
V V
+-------------------------+ +-------------------------+
| next Offset |---/ | next Offset |---/
+-------------------------+ +-------------------------+
| +---------------------+ | | +---------------------+ |
| | N+1th Btree offset | | | | N+1th Btree offset | |
| +---------------------+ | | +---------------------+ |
| | N+2th BTree offset |------+ | | N+2th BTree offset | |
| +---------------------+ | | | +---------------------+ |
| . | | | . |
| . | | | . |
| . | | | . |
| +---------------------+ | | | +---------------------+ |
| | last Btree offset | | | | | last Btree offset | |
| +---------------------+ | | | +---------------------+ |
| | | | |
| | | | |
+-------------------------+ | +-------------------------+
|
+---------------------+
|
V
Btree Header
+------------------+
| revision |
+------------------+
| nbElems |
+------------------+
| rootPage offset |---+
+------------------+ |
| pageSize | |
+------------------+ |
| btree name | |
+------------------+ |
| key serializer | |
+------------------+ |
| value serializer | |
+------------------+ |
| dups allowed | |
+------------------+ |
|
+
/ \
/ \
/ \
/ \
/ \
/ \
/ \
/ \
/ \
/ \
+ +
| |
V V
Node Leaf
+-----------------+ +-------------------+
| revision | | revision |
+-----------------+ +-------------------+
| nbElems | | nbElems |
+-----------------+ +-------------------+
| dataSize | | dataSize |
+-----------------+ +-------------------+
| child[0] | | value[0] length |
+-----------------+ +-------------------+
| key[0] length | | value[0] |
+-----------------+ +-------------------+
| key[0] | | key[0] length |
+-----------------+ +-------------------+
| child[1] | | key[0] |
+-----------------+ +-------------------+
| key[1] length | | value[1] length |
+-----------------+ +-------------------+
| key[1] | | value[1] |
+-----------------+ +-------------------+
| . | | key[1] length |
| . | +-------------------+
| . | | key[1] |
+-----------------+ +-------------------+
| child[N-1] | | . |
+-----------------+ | . |
| key[N-1] length | | . |
+-----------------+ +-------------------+
| key[N-1] | | value[N-1] length |
+-----------------+ +-------------------+
| child[N] | | value[N-1] |
+-----------------+ +-------------------+
| key[N-1] length |
+-------------------+
| key[N-1] |
+-------------------+
As you can see, we maintain 2 versions of the BTree offsets, and we
switch from the current one to the modified one by changing the first
page, so it's a one step operation.
Btw, the Node has been simplified : there is no need to keep the lentght
of page offsets, they are always 8 bytes long.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com