I have one oddity I noticed in TDB I want to investigate

There is an oddity - it's a design flaw. This message is "full disclosure" message.

It's very hard to have a test case to simulate these conditions in a reproducible way - this was found by code analysis and a sense of paranoia gained during time on a fault tolerant systems project in the telecoms sector awhile ago.

== Analysis

When preparing a transaction commit, the transactional node table is updating the master index of node hash to node id datastructure and that risks damage.

Entries are only added to this structure during prepare. On-disk structures can only be corrupted when a B+Tree update causes a block split and only part of the split is written back. Updating a single block only risks inaccessible junk in one block. Block splits occur about 1 in 100 updates.

If a block split is only partially written back the on-disk tree is broken. That can happen if writing caching flushes one block not the other (but they have the same write usage time so they are adjacent in the LRU queue), then there is a system crash, or if a filesystem sync writes one block not the other due to power failure.

The same problem could arise in the non-transactional system - B+Tree structural problems have not been reported.

This applies to direct and mapped modes although with different probabilities of damaging on-disk data.

== Plan

It's a design flaw and the fix is to reimplement the transactional node table.

== 2.7.1 Release

On balance, I don't think this should hold the release up, better to release and get another release out relatively soon.

1/ There are fixes elsewhere we want to get out.

2/ There are fixes in TDB. 0.9.1 is at least better than 0.9.0 (the design flaw is in 0.9.0) so holding up the TDB part of the release is actually withholding other fixes from users.

3/ The reimplementation needs testing thoroughly - it is more risk to data by releasing lightly tested code than to go with what there is.

        Andy

Reply via email to